Bart Scheers (University of Amsterdam)
The next generation of astronomical observatories are designed to carry out unique science: high-speed all-sky surveys, searches for rapid transient and variable sources, cataloguing the millions of sources and their millions of measurements. As a consequence these facilities will produce tens of terabytes per day. High-cadence data rates of tens of gigabits per second are neither exceptional. No database systems exist yet, that do keep pace with and store these huge volumes of scientific data, nor are they capable of querying the data scientifically with acceptable response times.
One of the first of these telescopes that will be operational is the International LOFAR Telescope, currently in its commissioning phase. The Transients Key Science Project aims to study all transient and variable sources detected by LOFAR. One of the products is an up-to-date catalogue of all sources detected by LOFAR, i.e. a spectral light-curve database, with real-time capabilities, which is expected to gradually grow with 50-100 TB/yr, making it the largest astronomical catalogue. The response time to transient and variable events depends strongly on the query execution plan of the algorithms that searches the (LOFAR and non-LOFAR) source catalogues for previous detections in the spatial, spectral and temporal domains.
The open source column-store database system MonetDB serves as the basic platform to address the data-intensive research challenges for LOFAR. In this talk I will describe the experimental infrastructure of the SciLens platform (scilens.org), a 330 node, 4-tier locally distributed infrastructure focussed on massive I/O, instead of raw computing power. The SciLens infrastructure is envisaged to be the prime choice for a scalable LOFAR light-curve database.
Furthermore, I will discuss the crucial queries and aggregate functions that give the best summarised statistical representation of the source properties during LOFAR observations when millions of measurements are taken. These results serve as quick reference points for having the all-sky model available in real-time.
I will highlight the new array-based query language SciQL, an extension of SQL:2003, which is currently under active development at CWI, and will be fully integrated in MonetDB. SciQL eases the scientifically very relevant cross-correlations of multiple (i.e. more than two) catalogues, and the light-curve analysis of the sources, with the finding of a periodic component to the variability, as this requires computing the Fourier transform of each stored light curve.
Paper ID: P136