In an age when the creation of data is growing exponentially and the conversation about big data analytics nears hype proportions, I think the question about data continuity and accessibility becomes increasingly important.

Missing Data

A recent post in Nature by Elizabeth Gibney and Richard Van Noorden regarding the loss of raw data associated with published research articles caught my attention when it was published.

The author summarized the work of a group of scientists who wanted to understand the level of accessibility of research data as a function of the date when results were published. They endeavoured to access the raw data associated with 516 published research articles ranging from 2 to 22 years old, considering factors such as active author email addresses, access to data, etc. In the original work, the investigators found that the odds of associated data being accessible fell by 17% per year and that within 20 years of an article’s publication up to 80% of the associated raw data can be lost along with the lost possibility of future researchers utilizing the data. The authors conclude by advocating for public archiving of data at the time of publication to ensure future accessibility.

First Nimbus ImageThis somewhat disconcerting article was offset by a more positive story by Sid Perkins, published on Science‘ website, describing work being undertaken at the University of Colorado Snow and Ice Data Center to make available archived Nimbus satellite imagery dating back to the mid 1960’s. This involves digitizing analogue data, mosiacing resulting digital images and then adding them to an accessible data archive. To date more than 250,000 images have been made available, adding considerable to a time series record of value in the assessment of issues such as high latitude sea ice variability and tropical and mid latitude weather variability. The extension of the data record to 50 plus years is truly impressive.

While it seems there are many questions around data compatibility (for another discussion), efforts to establish processes to ensure research data continuity and accessibility are to be commended and should be valued, even as new data is being generated at tremendous rates.

References

Scientists losing data at a rapid rate. Gibney, E. and R Van Noorden. 2013. Nature doi:10.1038/nature.2013.14416

Vines, T. H. et al. 2014. The availability of research data declines rapidly with article age. Curr. Biol. http://dx.doi.org/10.1016/j.cub.2013.11.014 (2013)

Nimbus data rescue: recovering the past to understand the future. 2014. http://cires.colorado.edu/news/press/2014/nimbus.html

Long lost data reveals new insights to climate change. Perkins, Sid. 2014. http://news.sciencemag.org/climate/2014/09/long-lost-satellite-data-reveal-new-insights-climate-change