VLT Pipeline operation and Quality Control: FORS1 and ISAAC

Reinhard Hanuschik and Paola Amico
Quality Control, European Southern Observatory
(The Messenger No. 99, March 2000, p.6)
 

1. Introduction

It is well known in the community that April 1, 1999, marked the begin of operations for ANTU, the first VLT telescope. It is less well known, however, that this date also marked the begin of real life for VLT data flow operations (DFO) at ESO headquarters in Garching. Forming the back end of the Data Flow life cycle, DFO has to act as data production, data distribution and data storage machine. All these functions form what is called Quality Control (QC). For achieving data products of the highest possible quality, all components have to perform well and collaborate closely.

At the moment of writing this, two VLT instruments are operational: FORS1 and ISAAC, with the next two waiting at the front door (FORS2 and UVES). We will briefly describe in the following the different QC tasks for the two first VLT instruments.

2. General functions

The most important tasks for QC Garching are: Present ESO strategy for processing and distributing data is as follows: calibration data are processed irrespective of the observing mode (both Visitor and Service Mode), science data are processed for Service Mode (SM) observing only. SM programs (of supported instrument modes) receive a full set of raw, reduced and calibration data. Processed calibration data will become generally available as soon as the Archive storage project has been realized.

Hence a Visitor Mode (VM) night, from the QC point of view, requires only processing of calibration data, while a SM night needs the full machinery producing master calibration and reduced science files. For a typical 50:50 mix of SM/VM nights and an average QC fish1 with 4 days-per-week duty, there is presently about one QC working day per ANTU operational night available. Any time more than that will produce a backlog.

1 Since this process bears some resemblance with trouts hedged in purification plants basins to indicate water quality, we have dubbed ourselves the 'QC fishes'.
 

3. FORS1

Data. Being a complex instrument with many different modes, FORS1 produces data since the very beginning of operations in a huge amount and variety. Period 63 produced about 24,000 raw FORS1 files - about 200 GB - , about half of them in Service Mode. 68.0% of all raw files were calibration data, 17.9% science data, the rest (12.7%) TEST data (acquisition, slit view, etc.). The vast majority of all FORS1 files (70.9%) was obtained in imaging mode (IMG), the second largest fraction (18.4%) in multi-object spectroscopy mode (MOS), 5.2% in long-slit spectroscopy (LSS), 3.9% in polarization imaging (IPOL) or polarization MOS (PMOS). Typically 100-200 files are produced per SM night which correspond to 1-1.5 GB of raw data resulting in another 1-1.5 GB of reduced data.

Pipeline operations. Due to the complexity of the task, we decided to start pipeline operations with the simplest modes, IMG and LSS. These together cover already 76% of all FORS1 data. Master calibration files routinely created are:

Hence for a typical IMG mode night about 30-40 master calibration files have to be created by the pipeline.

Master files are median averages from input sets of typically 3-5 raw files. The master creation recipe uses a kappa-sigma clipping routine to reduce random noise, suppress cosmics and stellar sources.

Science data are reduced using these calibration products. SCIENCE_IMG files are de-biassed and flattened. The pipeline uses twilight SKY_FLATs taken in dusk or dawn. These flats remove all small-scale CCD structure ('fixed-pattern noise'), the four-port pattern and all large scales except for the largest ones of order 1000 pixels. This is due to illumination gradients differing between night and twilight, and amounts to 1-2%. A NIGHT_FLAT would remove even this gradient perfectly, but is not routinely available. If possible, the pipeline extracts such flats from jittered science images. Success depends on the offset chosen for jittering, the nature of the sources and their density. If available, these NIGHT_FLATs are delivered as part of the SM data package, but they are not used for pipeline science reductions. Master SCREEN_FLATs are available, but their illumination pattern is very different from sky conditions. They are primarily used for monitoring the CCD performance.

Photometric standard stars are reduced the same way as science data, with the added step of source identification and extraction. At the moment, they produce photometric solutions (zeropoints) for the night. Currently this information is used to assess the quality of the night and trace telescope efficiency. SM data packages receive the zeropoint tables and the reduced standard star files.

LSS data are de-biassed and flattened using master SCREEN_FLAT_LSS files which contain high spatial frequencies only. The data are rebinned to wavelength space but not extracted. Hence fixed-pattern noise, slit noise and slit function are removed, as is slit curvature.

Planned next steps are the photometric calibration of the IMG files, and removing of the instrumental efficiency curve for LSS data. Finally the MOS pipeline will become operational in Period 65.

A total of about 3000 master calibration files, and about the same number of reduced science files, has been created in Period 63. 93% of these files are IMG files.

Distribution. The proper distribution of files in the Service Mode packages is not trivial, if calibration files are concerned. As the simplest example, look at science files taken in IMG mode and assume that the OB was taken with just one filter and in one CCD mode. This would make a minimum set of one master BIAS and one master SKY_FLAT taken with these parameters (plus, of course, the corresponding raw and reduced science files). Since it is not known, however, whether the programme requires photometry data, we always add all available STD_IMG files (raw, reduced, photometry table). To further enable the PI to reprocess all reduction steps, this requires all applicable SKY_FLATs and BIASses as well. Hence we actually blow up the amount of data delivered by a factor which can go up to 5 or more, but only this approach guarantees completeness. To slim down the CD-ROM package a little bit, we usually do not include those raw calibration files which successfully produced master files. These raw calibration files can be retrieved by the user from the ESO Archive (see article by Leibundgut et al. in this issue).

For the LSS mode, things become more complicated since spectrophotometric standard stars are taken in MOS mode. Hence LSS programmes receive full sets of LSS and MOS calibration files, and the blow-up factor is even larger than for IMG files.

Quality Control. Post-pipeline operations involve quality checks of the raw and the produced data. As a simple but time-consuming check, scanning the nightlogs is fundamental. This is presently done in the old-fashioned way, i.e. reading and, if needed, editing by hand. In the near future there will be tools to have nightlog information accessible for automatic processing, distribution and storage.

On raw and produced calibration data, several checks are done. From the BIAS frames, median values for the bias level (both across the whole CCD and per port), for the value of large-scale structure, and for the read noise for raw and master files are determined (Figure 1). On the SCREEN_FLATs and the SKY_FLATs, the mean values (across the whole CCD and per port), the random photon noise, the fixed-pattern noise, and the large-scale structure are measured (Figure 2). SCREEN_FLATs are also used to measure actual gain values. It is checked how random the 'random' noise is.

Photometric zeropoints are determined. The quality of the LSS dispersion solutions and the effective resolution are measured.

All these parameters are stored in tables and their trends monitored. There are also checks whether random noise and fixed noise scale with signal as expected.

In reduced science IMG files, the quality of the flattening process is controlled.

Feedback. Since part of the results of the QC process is a direct health check for the CCD and the instrument, a natural task for QC Garching is providing feedback to the CCD group and to Paranal Science Operations. This is mostly channelled through the Instrument Operation Teams which combine expertise about the instrument.

Generally it is important to store and provide QC results in a centralized way open to anyone interested. Options are putting results onto the web, have quality control parameters stored in a QC database, and ingest QC information into the Archive. As a first step check the Q page to be found under http://www.eso.org/observing/dfo/quality/index.html YY

Software for operations. For FORS1 and ISAAC, software for the lower-level functions existed when operations started, provided by ESO or by the instrument consortia: the data reduction pipeline and tools for organizing the data and processing them.

All higher-level tasks, such as distributing raw and product files to the final SM data packages, pre-selecting data for processing, assessment of data quality, storage of QC parameters etc. started during Period 63 without software support. Tools had to be developed during operations. Such 'hot development' offers the advantage of being extremely efficient since any new script could be tested and improved under real life conditions. Evolutionary cycles were short. However, the price to pay was a very tough schedule since certain elementary tasks had to be provided, no matter whether tools existed or not.

It soon became clear that there is only one option for keeping the head above water: create (UNIX shell) scripts and (MIDAS) procedures for automatic processing. The strategy to survive is: clearly identify the jobs which can be routinely done and those which can't. Then leave the routine stuff to the machine preferrably for overnight processing, and do the non-routine work during daytime. This primarily involves decision making, i.e. quality assessment of master calibration and reduced files, commissioning of pipeline recipes, and keeping control of the whole process.

The backbone of the FORS1 QC job is formed by about 30 shell scripts which translate the basic steps of data flow operations into well-defined functionalities. This package is called 'SMORS' (Service Mode Optimized Reduction Scheme). It produces results which are repeatable and predictable, and its operation is safe. With this package, we do the full data processing from the very beginning (provide listings for newly arrived raw data) up to the end of the life cycle (delete all data for an SM programme once the CD-ROMs have been distributed).

SMORS being the backbone, a second package is the 'brain' of FORS1 quality control: 'qc_dec' (QC decision), a number of MIDAS procedures developed for post-pipeline assessment of data quality, measuring QC parameters and trending. These tools enable decision making, e.g. accept or reject a master calibration file, measure the fixed-pattern noise in a master_screen_flat, check the removal of stellar sources in a master_night_flat, check the degree to which a SCIENCE_IMG file has been flattened, create nightly averages for photometric zeropoints.

As a by-product of the tools developed for pipeline operations, a script package 'Pipe' has recently been installed on Paranal to facilitate the operation of the FORS1 quick-look pipeline. This tool can be used by staff astronomers to create their own master calibration files and obtain photometric zeropoints during the night, so that real-time assessment of the quality of the night becomes possible.
 

4. ISAAC

Quality control operations for ISAAC resemble in broad sense those described in the previous section for FORS1. The differences between the two instruments, especially in terms of operations, lead to a different approach from a QC point of view.

P63 statistics. During Period 63 there have been 92 nights with ISAAC data (science and calibration frames) and a total of 59 service mode nights (with science data), which have produced a total of 24087 files (including commissioning and science verification data), divided into 13349 science frames, 9985 calibration frames and 753 test frames. The total number of SM programs was 35, 4 of which required quick releases (that is, release of the data soon, typically 1 day, after observations). At the end of the Period, 29 programs had been shipped to the users, while the remaining 6 were put on hold by User Support Group (USG) and Science Operations in Paranal (PSO) to allow for follow up observations during Period 64. A total of 69 CDs were prepared and cut, the biggest program received 12 CDs, the smallest only 1. The longest program spanned 5 months and the densest was observed in 12 different nights.

CD packing. Each PI of a SM program receives a set of CDs containing data subdivided by night of observation. Each "night" contains the following data:

During Period 63 the pipeline produced a total of 268 master calibration frames (SWI1 mode only, master flats and darks) and 55 coadded images (in jitter mode). The average number of frames used to produce a single master frame was 18 for calibration and 17 for science frames.

In addition to the data, for each night of observation and each set of data (raw science and calibration, reduced science and calibration) a table with the list of files and their most relevant keywords (e.g. RA, DEC, DIT, filters, central wavelength, etc.) is included. Each CD contains the nightlogs for the relevant night of observations with all pertinent log entries written during observation by the operation staff astronomers in Paranal, and a data reduction log, which contains information on the data package, the reduced frames, the OB list, etc.

The packing process for a specific program starts upon receipt of a "completion" signal from USG and finishes when the package is sent to the PI. In the worst case in Period 63, the delivery time had been 40 days after the signal while in the best case less than 1 day. It has to be noted that the delivery time decreased steadily during the Period, and now, for Period 64, the average is 1 day after receiving the completion signal.

When operations for ISAAC started in April 1999, it was decided to concentrate on SWI1 mode only and to progressively increase the number of products delivered to the users. This choice was driven mainly by the specific need of further testing for the recipe set in short wavelength spectroscopic modes (SWI1 and SWI2) and by the more general fact that, running operations for the first time ever, was a task full of unknowns. Now that the whole data-flow process is better understood, we progressively add tasks to quality control and services for the user community.

ISAAC Pipeline Recipes. Each VLT instrument has its own pipeline set of recipes, which support all of part of the instrument modes. In the case of ISAAC, presently the pipeline supports the short wavelength modes, imaging and spectroscopy and will probably be extended in the future to include the long wavelength modes. In Period 63, the ISAAC pipeline set of recipes was split into SWI recipes, all developed as part of the Eclipse software (N. Devillard, "The eclipse software", The Messenger No 87 - March 1997 and http://www.eso.org/projects/aot/eclipse/ ), and SWS recipes, developed in MIDAS by Y. Jung. As for Period 64, and starting with operations in Period 65, all SWS recipes have been also included in the Eclipse software, thanks to the work of Y. Jung and T. Rogon. Table 1 lists the entire set of templates supported by the pipeline.

Table 1: List of templates supported by the ISAAC pipeline recipe set. The recipe's products are made available to the user community during the indicated observing period. The description explains shortly what the recipe does. For more detailed explanations, please refer to the ISAAC web page and references therein (http://www.eso.org/instruments/isaac/).
 

Supported imaging templates Period Description
Jitter 63 It reduces images taken in jitter and jitter + offset modes. The jitter data reduction process is divided into flat-fielding/dark subtraction/bad pixel correction, sky estimation and subtraction, frame offset detection, frame re-centering, frame stacking to a single frame, and optional post-processing tasks.
Jitter+Offset 64
Darks (including spectroscopy) 63 Creation of master dark frame. The process sorts out frames with an identical DIT and produces the averaged frames. It computes also the read-out noise.
Zero-points 64 Calculation of zero points. It computes the number of counts, and relates that measurement to a standard star database The identified infrared star database at present contains about 800 star positions with magnitudes in bands J, H, K, Ks, L and M.
Twilight flats 63 Creation of master flat frames. It takes as input a list of files taken at twilight and produces the flat-field of the detector by observing this rapidly increasing or decreasing signal. Since it computes a characteristic curve per pixel, it also creates a bad pixels map.
Illumination frame 64 Creation of master illumination frames. It subtracts dark, divide by flat-field and correct bad pixels if the adequate calibration files are available. The final product is a 2d polynomial surface normalized to a value of 1.
Supported spectroscopy templates Period Description
NodOnSlit 64 (partly) It reduces images taken in jitter spectroscopic mode. The process is divided into the classification of the input files, correction of the distortion, shifting the frames and averaging, wavelength calibration, creation of a combined image, detection and extraction of a spectrum. The wavelength calibrated spectrum is provided for all calibration standard star observed in this mode. It is not provided for science frames.
Spectroscopic flatfield 64 Creation of master spectroscopic flat frames. This algorithm is applied to each pair of frames (lamp on and off). The difference 'on'-'off' is computed and the result frame is divided by its mean.
Arcs 65 It detects vertical or horizontal arcs in a spectral image, models the corresponding deformation (in the x or y direction only) and corrects the found deformation along the x or y direction. Finally it computes a wavelength calibration using a lamp spectrum catalogue.
Star trace 65 It performs startrace analysis. It takes as input an image and produces two tables of output: a line position table (containing the fitted coordinates of the curved lines) and a polynomial coefficient table (describing the found deformation).
Slit position 65 It finds the exact position of a slit. 
Response function  65 Determination of the spectroscopic response function, by means of the extraction and wavelength calibration of a standard star spectrum.
 

Among the responsibilities of QC, testing of the pipeline recipes is one of the most important, since the work of QC relies entirely upon this set of recipes for the great part of the work. The quality control scientist produces master calibration frames and certifies their quality before shipment to the users. In the near future all frames will be inserted in the calibration database and made available to the user community. As for Period 64/65 the following by-products of the pipeline are calculated and their trending monitored: read-noise of the detector, zero points for each night. Zero point values are also made available to the users with SM programs. Shortly, they will be published on the Web for the entire user community. In addition, the goodness of sky-subraction and coaddition, as well as image quality is monitored in all coadded images produced for SM programs. Further checks will be introduced for all products produced by the pipeline (e.g. spectroscopic jitter mode images).

For additional information on the supported templates and operating modes see the ISAAC home page online and the ISAAC manual (http://www.eso.org/instruments/isaac/ and references therein) and the PSO web pages for ISAAC (http://www.eso.org/paranal/sciops/ISAAC_Info.html).

Software for Operations. When operation started in April 1999, we had clearly understood the general picture of the data-flow, but we missed first-hand experience on the actual amount and the type of work, on the most efficient way to do it and of course on all those unknowns, which are to be expected every time a new enterprise is started.

To perform the first 4 tasks listed in section 2, QC could from the beginning make use of the data-flow system, which includes, among the others, the Data Organizer (DO), a software which classifies the raw data and creates reduction blocks, which are in turn used by the Reduction Block Scheduler (RBS) software to fire the proper recipe and run it for the list of raw frames previously classified. Both DO and RBS work in a completely automated way. For the particular case of ISAAC, the different science operations needs, which change according to the particular observing program to be executed, and the wish to keep them as flexible and efficient as possible, require a greater level of "human" interaction than what allowed by automated software. In the majority of cases, it is necessary to classify the files, to select the frame-set as input to a data reduction recipe and to tailor the configuration parameters of the recipe itself manually. Given the great amount of data that reaches Quality Control and that has to be processed and distributed, a new software tool had to be foreseen. The main requirements for it were: flexibility and interactivity of operations, compatibility with the data-flow model and with the needs of QC work, speed (in a typical ISAAC night a minimum of 300 files can be produced and in a typical QC working session many nights of data must be loaded, classified and reduced at the same time) and configurability. The Software Engineering Group (SEG), namely N. Kornweibel and M. Zamparelli, developed a software named Gasgano (see Figure 3), which provides these and many other functionalities, which make it the tool routinely used by QC for ISAAC. The tool has been recently officially released to PSO, but has been tested and used by QC since its very first "unofficial" release.

Since the creation of a CD package for a SM program is not yet feasible by means of Gasgano, a set of UNIX shell scripts, similar to those created for FORS1, was developed for ISAAC. These scripts allow to quickly and (semi) automatically assemble all science and calibration raw data, their corresponding reduced frames and the reduction logs; in addition the scripts produce a reduction and packing log, file listings, useful statistics (number of frames divided by type - science or calibration -, Program ID and OB ID and list of rejected frames, typically files with incorrect or missing keywords) and check logs. The latter are compared with information retrieved by the scripts themselves from various database tables of the ESO archive (OB repository, Observations, etc.). The difficulty of the packing process lies almost entirely in the non-uniform distribution in time of the calibration files and in the vastness of the science data parameter space: the script must be able to "intelligently" choose a proper set of calibration frames, observed as a rule under a different program id and in general in different days than those of science frames. Those may also vary, for different SM programs and within a single program, in all possible modes allowed by the instrument.

We have already started to develop an extra set of scripts/tools to check the quality of processed data, up to now performed manually on each frame, to provide instrument performances checks and to perform trend analysis of quality parameters. It is foreseen that they will be fully ready to support QC operations within Period 65.

5. Instrument Operations Teams

From the user point of view, Quality Control activities represent the last element of the data flow life-cycle chain, in the sense that the final delivery of the data, which ends the cycle, relies upon it. Less evident for outside observers is that the entire operation process reckons upon the work of a fairly large group of persons with different responsibilities within the chain and owes its success to the interactions that occur among them. This group forms the so called Instrument Operations team (IOT) and every instrument for the VLT has a similar team assigned to it to ensure operations.

        Table 2: Members and their respective roles within the team for ISAAC and FORS1
 

Role ISAAC  FORS1
Instrument Scientist Jean-Gabriel Cuby Gero Rupprecht
Operations Staff Astronomers Chris Lidman, Gianni Marconi Hermann Boehnhardt, Thomas Szeifert
User Support Astronomer Almudena Prieto, Fernando Comeron  Palle Møller
Pipeline Development Nicolas Devillard, Yves Jung, Thomas Rogon Stefan Bogun
Quality Control Scientist Paola Amico Reinhard Hanuschik (future: Ferdinando Patat)
 

6. Lessons learnt

After more than one period of operation, it is clear that the basic concepts of Quality Control are routinely working. Data are processed and their quality checked, PIs receive their SM data packages. These data packages add value to the simple traditional raw file collections.

Some important issues could be identified during operations. One is: keep the instruments simple, if you want to have simple operations. There is a close correlation between, e.g., the many modes offered by FORS1 and the complexity of its data flow operations.

Operationally, do as much as possible in automatic mode. This is reliable, reproducable, and can be done in batch mode. Scripts are preferrable over interactive tools if you go for mass production. The evolutionary development approach, though dictated by circumstances, proved to be efficient.

Data integrity is very important. We cannot afford to manually correct errors introduced upstream: either files arriving in Garching are syntactically (DICB conform) and logically integral (e.g. have proper programme and OB ID), or they are useless. DICB conformity is also extremely important for Archive integrity. Data with wrong keyword contents will never be properly retrieved. They are just wasting disk space.

Relevant information has to be kept central. Facilities like web pages, relational databases and the Archive are crucial.

All in all the tasks of QC Garching combine astronomical with information technology challenges. They close the loop for VLT data production, provide added value to the community and create a sound database for assessment of instrument performance.



 

Figure 1: FORS1 trend plot of the BIAS QC parameters median value (diagram 1) and read noise (diagram 2) for the four CCD modes low and high gain / 1port and 4port readout. The period covered is 1999-10-01 to 2000-01-01. These plots are used to assess the stability of the CCD system and identify outliers.
ps file (133 kB)  jpg file

Figure 2: Trend plot for QC parameters of SCR_FLAT_IMG calibration files. Diagram 1 shows mean values, diagram 2 photon noise (in raw frames) and fixed-pattern noise, diagram 3 large-scale structure (both in master files). Data are for the five Bessell UBVRI filters. The last plot clearly shows the CCD contamination slowly increasing with time, mainly affecting the U filter.
ps file (217 kB)  jpg file

Figure 3: Snapshot of the Gasgano GUI in use with ISAAC data. The tool allows interactive selection of classified frames and their input to the corresponding pipeline recipes.
ps file (7MB)   gif file


rhanusch@eso.org, Paola Amico