CYBERSKA

 A Cyberinfrastructure platform to meet the needs of data intensive radio astronomy on route to the SKA

The Use of Scientific Data: A Content Analysis

http://arxiv.org/abs/1007.4602

Nowadays, science has been coming into a new paradigm, called data-intensive science. While current studies of the new phenomenon focused on building up infrastructure for this new paradigm, yet a few studies concern users of scientific data, particularly their usage practices in the newly emerging paradigm, even though the importance of understanding users' work flow and practices has been summoned. This study endeavors to improve our understanding of users' data usage behavior through a content analysis of publications in a frequently cited new paradigm-related project, Sloan Digital Sky Survey (SDSS). We found that (1) nearly half studies used one data source only. A few studies exploited three or more data sources; (2) the number of objects that were analyzed in SDSS publications is in all scales from one digit to millions; (3) different paper types may affect the data usage patterns; (4) Users are not only consumers of scientific data. They are producers too; (5) studies that can use multiple large scale data sources are relative rare. Issues of data provenance, trust, and usability may prevent researchers from doing this kind of research.