Data sparse or information poor?

DRIP (data rich, information poor) has been used for decades to describe the ineffective or absent use of data that an organization or enterprise collects daily to inform important decisions involving certain conditions that could impact the mission of that entity. Breaking the curse of DRIP involves collecting, storing, organizing, and accessing data so that it can be converted into information. That information must subsequently be archived, analyzed, and ultimately used to create a DAIR (data and information rich) environment. Gleaning information from data is challenging, especially if the data is collected for a purpose other than converting it to information, or the information sought requires combining multiple data sources.

Correcting for DRIP postures requires both technical and scientific acumen. In science, data is collected in search of information. In many cases, research is conducted to provide information from data. The R2O cycle seeks to transition the new research information that is apposite for the practitioner. The transition that the R2O process facilitates serves to refine the information, and consolidates or reformats data into parameters or quantities that the practitioner understands. A practitioner may also convert data to information through their mental model (e.g., a day with a temperature of 95 °F and a dew point of 75 °F is hot and muggy).

In meteorology, many consider the oceans of the world to be “data sparse”, meaning there is not enough data collected from the high seas to fully characterize the atmosphere and its underlying dynamical and physical processes that are at play there. This is in spite of the numerous observations from weather satellites, as well as the occasional reporting buoy or overhead aircraft with the appropriate sensor package. To that end, the atmosphere over the oceans is really “information poor”; the current observing systems, despite the data they provide, are not producing, or contributing to, actionable information in the eyes of the practitioner. However, there is quite a substantial amount of data.

To be fair, there are certainly portions of the earth where the observations are so lacking that more data is necessary. But more data is not always a potent medicine for limited information. Nor is it always easy to determine whether there is a fundamental limitation on the data that already exists. In the context of R2O, it is necessary to examine how researchers are interrogating the data that is available and how practitioners are applying the information to serve the consumers. New analytical research methods could provide clues that develop additional information from existing data. Similarly, altering the nature and/or amount of the information (or data) that practitioners review could change its interpretation.

It is not obvious whether animating a time sequence of spatial data/information at different speeds may somehow alter how a practitioner perceives the evolving scenario. For example, does a growing thunderstorm appear more severe if it is viewed in a fast animation instead of a slow animation? Does a different palette used to colorize the images in the animation alter that perception? Does the temporal frequency at which the images were collected matter? An interesting recent piece of research investigated how the playback speed of videos depicting violent conduct impacts viewers’ judgment of the intent of the actor. How do different frame rates influence scientific practitioners attempting to understand a complex atmosphere?

For successful R2O that respects the value of observations, we must learn more about how people process information. And information must be confined in the context in decisions. To reach that point, there must be adequate research with ample data. But challenges will continually compound as a desire for greater storage capacity, faster computers, and innovative methods accompany the “big data” future. Even still, the continuum between the most data and the best decisions (via information) is not a linear function. Instead, information must be tailored and actionable to be rich.