|- Lessons Learned|
|About the SDSS|
Special Issue of Computing in Science and Engineering Dedicated to the SDSS Science Archive
The January/February 2008 issue of the journal Computing in Science and Engineering (CiSE) - a joint publication of the American Institute of Physics and the IEEE Computer Society - was dedicated to the SDSS Science Archive. The issue featured several in-depth, peer-reviewed articles on various components of the SDSS-II Science Archive. For SDSS-III, the Data Archive Server (DAS) has been replaced with the Science Archive Server (SAS), whereas the Catalog Archive Server (CAS) continues (with significant enhancements and schema changes) to provide access to the catalog data via the SkyServer Web interface and the CasJobs batch query service.
The November/December 2008 issue of CiSE also had a follow-up article on lessons learned from the SDSS-II CAS deployment.
These articles are described below with links to the PDF for each article.
|The Sloan Digital Sky Survey Science Archive represents a thousand-fold increase in the total amount of data that astronomers have collected to date. The pioneering instrumentation technology that made this possible is matched by groundbreaking tools that let anyone in the world access terabytes of SDSS data online.|
The Sloan Digital Sky Survey Data Archive Server
|The Sloan Digital Sky Survey's Data Archive Server (DAS) provides public access to data files produced by the SDSS data reduction pipeline. This article discusses challenges in public distribution of data of this complexity and how the project addressed them.|
The Catalog Archive Server Database Management System
|The multiterabyte Sloan Digital Sky Survey's (SDSS's) catalog data is stored in a commercial relational database management system with SQL query access and a built-in query optimizer. The SDSS Catalog Archive Server adds advanced data mining features to the DBMS to provide fast online access to the data.|
The sqlLoader Data-Loading Pipeline
|Using a database management system (DBMS) is essential to ensure the data integrity and reliability of large, multidimensional data sets. However, loading multiterabyte data into a DBMS is a time-consuming and error-prone task that the authors have tried to automate by developing the sqlLoader pipeline--a distributed workflow system for data loading.|
CasJobs and MyDB: A Batch Query Workbench
|Catalog Archive Server Jobs (CasJobs) is an asynchronous query workbench service that lets users run unrestricted SQL queries against scientific catalog archives. After running queries in batch mode, users can save their results to a personal database called MyDB before downloading them, letting users manage their query workloads, results, and histories without causing network overloads.|
|The SDSS is one of the first very large archives in astronomy and other sciences, as we enter the era of data-intensive science. Here the authors summarize some of the important and generally applicable insights they have gained (often the hard way!) over the past decade of SDSS development.|