SDSS Science Archive
Sloan Digital Sky Survey III
SkyServer DR9
 Site Map
 SkyServer paper
 CiSE papers
     - Overview
     - DAS
     - CAS
     - sqlLoader
     - CasJobs
     - Lessons Learned
 Site Traffic
 About the SDSS
 SkyServer Team
The SDSS Science Archive

Special Issue of Computing in Science and Engineering Dedicated to the SDSS Science Archive

The January/February 2008 issue of the journal Computing in Science and Engineering (CiSE) - a joint publication of the American Institute of Physics and the IEEE Computer Society - was dedicated to the SDSS Science Archive. The issue featured several in-depth, peer-reviewed articles on various components of the SDSS-II Science Archive. For SDSS-III, the Data Archive Server (DAS) has been replaced with the Science Archive Server (SAS), whereas the Catalog Archive Server (CAS) continues (with significant enhancements and schema changes) to provide access to the catalog data via the SkyServer Web interface and the CasJobs batch query service.

The November/December 2008 issue of CiSE also had a follow-up article on lessons learned from the SDSS-II CAS deployment.

These articles are described below with links to the PDF for each article.


The Sloan Digital Sky Survey - Drinking From the Fire Hose
Ani Thakar

Get text as PDF document (opens in new window)
The Sloan Digital Sky Survey Science Archive represents a thousand-fold increase in the total amount of data that astronomers have collected to date. The pioneering instrumentation technology that made this possible is matched by groundbreaking tools that let anyone in the world access terabytes of SDSS data online.

The Sloan Digital Sky Survey Data Archive Server
Eric H. Neilsen Jr.

Get text as PDF document (opens in new window)
The Sloan Digital Sky Survey's Data Archive Server (DAS) provides public access to data files produced by the SDSS data reduction pipeline. This article discusses challenges in public distribution of data of this complexity and how the project addressed them.

The Catalog Archive Server Database Management System
Ani Thakar, Alex Szalay, George Fekete, and Jim Gray

Get text as PDF document (opens in new window)
The multiterabyte Sloan Digital Sky Survey's (SDSS's) catalog data is stored in a commercial relational database management system with SQL query access and a built-in query optimizer. The SDSS Catalog Archive Server adds advanced data mining features to the DBMS to provide fast online access to the data.

The sqlLoader Data-Loading Pipeline
Alex Szalay, Ani Thakar, and Jim Gray

Get text as PDF document (opens in new window)
Using a database management system (DBMS) is essential to ensure the data integrity and reliability of large, multidimensional data sets. However, loading multiterabyte data into a DBMS is a time-consuming and error-prone task that the authors have tried to automate by developing the sqlLoader pipeline--a distributed workflow system for data loading.

CasJobs and MyDB: A Batch Query Workbench
Nolan Li and Ani Thakar

Get text as PDF
     document (opens in new window)
Catalog Archive Server Jobs (CasJobs) is an asynchronous query workbench service that lets users run unrestricted SQL queries against scientific catalog archives. After running queries in batch mode, users can save their results to a personal database called MyDB before downloading them, letting users manage their query workloads, results, and histories without causing network overloads.

Lessons Learned from the SDSS Catalog Archive
Ani Thakar

Get text as PDF document (opens in new window)
The SDSS is one of the first very large archives in astronomy and other sciences, as we enter the era of data-intensive science. Here the authors summarize some of the important and generally applicable insights they have gained (often the hard way!) over the past decade of SDSS development.