Workflow information system for QC: Quality of IDP production


	Workflow information system for QC

WISQ

Overview

Data production
Raw data downloads
AB processing
AB certification
Product data ingestion
Produced packages (histo.)

Other ...
Execution times
IDP downloads
IDP quality
GIRAFFE IDPs
MUSE IDPs
XSHOOTER: IDPs
XSHOOTER: telluric IDPs
UVES IDPs
HARPS IDPs

DataTransferMonitor (int.)
BandWidthMonitor (int.)
QC1 db access
XDM
Ganglia (int.)
[best viewing with style sheets and javascript enabled]

IDP quality

The IDP (internal data products) have a quality control process which is tuned towards mass production. Due to the large number of products, and due to the standardized pipeline process, it is not feasible, and not necessary, to check each individual product. Instead a statistical monitoring has been implemented.

The key parameter for data quality of spectral products is the signal-to-noise ratio (SNR). It is delivered by the pipelines for each product. There are two fundamental ways of assessing the measured SNR:

method 1 is to compare the measured SNR (or the noise) to the mean signal of the extracted spectrum;
method 2 is to compare it to the mean signal of the raw data.

The first method assesses the noise properties of the products. Ideally the noise budget of the products is dominated by shot noise (the unavoidable quantum noise due to the input signal), and the corresponding relation betwen SNR and signal is a square-root law since the photon noise scales with the square-root of the signal. Any other noise source would increase the total noise, and therefore decrease the SNR. On the other hand, a partly saturated spectrum would become insensitive to the signal and to photon noise, thereby mimicing a higher SNR.

While this method relates properties of the extracted spectra only, being insensitive to systematical effects like suboptimal extraction, the second method combines properties of the raw data and the processed data and therefore is able to monitor the extraction quality. It compares the expected signal (or SNR) with the measured signal (or SNR).

Both methods are statistical and can only work with correlation plots combining values from many different spectra. This is the ideal quality control for large datasets.

Here is an example for method 1:

Fig. 1: Plotted are data points for SNR vs. mean_signal for GIRAFFE IDPs, brightest fibre only (i.e. one data point per input raw file), setting Medusa_H548.8 (red data) and all Medusa HR settings, for a certain time range of 3 months. The upper broken line is a pure square-root law fit, assuming pure photon noise. The lower broken line contains the same photon noise, plus a fixed-pattern component.

Every data point represents the brightest fibre of a dataset. No matter how many counts the spectra have collected, and which setting has been used, the data fall nicely on the square-root curve (the upper broken line is purely shot-noise dominated).

This is a remarkable result since any additional noise source would reduce the SNR. For instance, an imperfect flat-fielding would probably not fully remove the fix-pattern (gain) noise, thereby leaving a noise component that scale with the signal. Such a component is simulated in the lower curve where the total noise is assumed to be composed of the same photon noise as before, plus a small-scale (1%) residual fix-pattern noise.This would leave the SNR curve flattened out at high signal values since it is then dominated by the second noise source. We see, however, a square-root law out to very high values of SNR (up to 700, beyond the scale of Fig. 1). A similar effect would be seen if fringing is not perfectly cancelling out (since it also scales with counts). The observed shape of the SNR curve gives us confidence in the quality of the calibration files used, and in the algorithms. The square-root law is also followed in the inner part (low-SNR, low signal), Fig. 2.

Fig. 2: The inner part of Fig. 1. The broken line is the upper curve in Fig. 1.

An even closer zoom is visible in Figure 3. Here, all data points are shown from all fibres in the H548.8 (Medusa1/2) settings, no other setting. Again, the theoretical curve is nicely filled.

Fig. 3: Only fibres from the H548.8 settings are displayed here. In red the brightest fibres (the ones also visible in Fig. 1 and 2), plus in blue all other fibres (SCIENCE only).

A related approach is possible for the MUSE IDPs where we have the limiting magnitude ABMAGLIM, a noise measurement for the background. This parameter can be correlated with the exposure time since ABMAGLIM does NOT relate to source properties but to the (SKY) background. This approach has the limitation that it assumes that the background is SKY-limited, an assumption which is generally true but is wrong in case of crowded fields (globular clusters) and of extended sources larger than the field-of-view. Also, since it is derived from a gaussian fit to the overall field-of-view statistics, it might include residual gradients and other artefacts which all contribute to broaden the noise peak, hence to an over-estimation. Nevertheless the parameter is useful for a statistical monitoring.

Method 2: For some IDPs, there is flux information from the raw files available which can be correlated to the SNR in the products. This relation is more powerful than the method 1. Consider a systematic error in the localization of the echelle orders in a pipeline. In method 1 this error would never be visible since we compare two properties which both suffer from this error. With method 2 this error would become detectable, at least if it occurs sporadic, or from a certain time (pipeline version) on.

For the HARPS IDPs we are able to correlate their (pipeline -measured) SNR with the flux of the target measured with the exposure meter.

Find more information about the UVES IDPs here, and about the XSHOOTER IDPs here. The archive interface for spectroscopic data products can be found here.