A Cyberinfrastructure platform to meet the needs of data intensive radio astronomy on route to the SKA

From FITS to SQL - Loading and Publishing the SDSS Data

By Samuel George 3809 days ago

For large astronomical databases like the SDSS Science Archive, data loading is potentially the most time-consuming and labor-intensive part of archive operations, and it is also the most critical: it is the last chance to examine the data before it is published. We attempted to automate this job as much as possible, and to make it easy to diagnose data and loading errors. We describe the sqlLoader—a distributed workflow system of modules that check, load, validate and publish the data to the databases. The workflow is described by a directed acyclic graph whose nodes are the processing modules. It is designed for parallel loading and is controlled from a web interface (Load Monitor). The validation stage represents a systematic and thorough scrubbing of the data. Finally, the different data products are merged into a set of linked tables that can be efficiently searched with specialized indices and pre-computed joins.