Assessing the importance of data

As new data sources and the amount of unique data collected continues to expand, subjected only to technical limitations, scientific enterprises, with the weather enterprise no exception, are prone to human limitations to interpret and apply additional data to scientific questions and operational challenges. Therefore, in order to assess the value of observations or research byproducts from different sources but of the same type, it is necessary to understand how the data is used, and how often it is used in comparison to “adjacent” data. That is, the value of observational data and data conveyed in the form of research byproducts is higher if it is routinely used in scientific and/or operational pursuits and that use impacts the scientific and/or operational result. This concept should strategically drive research to operations initiatives.

For example, in the weather enterprise, there are multiple sources for total precipitable water (TPW) observations. TPW is a measure of the specific amount of moisture in a column of air. There are many sources of TPW available for research and operational meteorologists, including polar-orbiting (with infrared and microwave sensors) and geostationary satellites, Global Positioning System ground receivers, and radiosondes. However, meteorologists only require a single analysis. Which observing system should they use? Which one do they use?

In both cases, it depends. There has been some work to prepare a blended analysis with multiple sources to condense the amount of observational data, but there are some complications. The disparate sources for TPW have independent biases and availability. Whether a meteorologist consumes multiple analyses of TPW or a single, blended analysis is a matter of time and importance to the underlying scientific or operational process, but it is unlikely that the unique characteristics of the data, including the limitations of the different observations, are well known to that individual. There is a broader opportunity to provide information, such as TPW, with the degree of specificity that the scientist or practitioner requires. However, it is most important to understand whether all of the unique types of observations are worth the cost.

The value of observational data and related research byproducts is best determined through four metrics: accuracy, precision, spatial coverage, and temporal frequency. All four metrics are relative to the scientific and operational needs in a value assessment. For example, higher accuracy may not lead to higher value if that higher accuracy does not provide actionable information for the scientist or practitioner. To that end, value is relational and relative between datasets and derived products alike.

Figure: The observational data and research byproduct characteristics are accuracy, precision, spatial coverage, and temporal refresh (frequency). The best observational data and research byproducts are the most accurate and precise, and highest in spatial coverage and temporal refresh. Consider plotting the four metrics for different observing systems along the axes, then, for each observing system, connect them, and compare the area covered to other systems.

The importance of accurate measurements is understood but comparisons of different types of observations or research byproducts can ultimately subject reported accuracies to debate, especially when an observation averaged in time or space is compared to an instantaneous point observation. For observations and higher order products, accuracy can vary based on the environmental range of the measuring instrument as well as, in the case of remote sensing, viewing geometry. Accuracy should be assessed in full consideration of that range, the character of the instrument or algorithm, and its skill at representing a medium regardless of its uniformity. For example, a weather satellite may have better accuracy for skin temperature sensing homogenous scenes (such as clear sky over ocean), instead of heterogeneous ones (such as partly cloudy skies over mountainous terrain).

Related to accuracy is precision. The best observations are high in both accuracy and precision. Still, precision is important independent of accuracy because sometimes a single quantitative value does not matter as much as the trend of a quantity. High-precision observations and second-order calculations that may not be accurate, or where the accuracy is unclear because the quantity is not easily compared to others, are valuable so long as the precision is consistent with time so that a trend is notable (such as during advection).

The best observation platforms gather data over large geographic areas and often, especially when the cost per observation is low. Spatial coverage and temporal frequency are important for capture phenomena on various different geographic scales that evolve at different timescales. Operational meteorologists launch radiosondes twice per day at select locations over the United States but geostationary weather satellites can capture nearly an entire hemisphere at a resolution of approximately four kilometers in less than 30 minutes (as of this writing). While radiosondes and weather satellites do not produce identical raw observations, some types of information are similar. Yet the benefits and cost of radiosondes scale with the number of actual balloons and sensor packages launched. Where an individual radiosonde is relatively inexpensive, in the hundreds of dollars, compared to a geostationary satellite, in the billons of dollars, the longevity and coverage of geostationary satellites and their instruments makes them an ideal choice for monitoring mesoscale convective systems, for example, which may slip undetected within the space and time gaps of the global, or even national, radiosonde network.

There is one other consideration when it comes to observations: the number of prospective subsequent uses for that observation in derived products from research. An observation that can contribute to several subsequent research byproducts can be more valuable than one with a limited number of research applications, in certain situations. This is a multiplier to the four aforementioned characteristics. In some cases, though, a single very important use can outweigh many smaller, secondary uses.

The better the accuracy and precision, and the greater the spatial coverage and temporal frequency, the more fit for use an observation, or second-order calculation from that observation, is to enterprise operations and scientific studies. But in analyzing the characteristics of observations and related products, and their suitability to practitioner and researcher challenges, do not lose sight of the optimal baseline and the role of these metaphorical “pieces to the puzzle”. Maximizing the value of research to operations at a strategic level requires a mastery in the applications of those observations entering, and research byproducts exiting, the R2O process.