Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO
*make printable New:
v2.3:

rollback of change in v2.2 (only RAWFILEs from ABs with status != CREATED are counted): we count executed AND created ABs, to include the science associations
see also:
 

v3.0:
- new option -I for quarterly report
- mode dependent statistics terminated, only 'ALL' statistics done
- configuration file: simplified

[ used databases ] databases ngas..ngas_files for raw file sizes; obs_metadata..data_products for file names;
qc1..daily_stat and monthly_stat for ingest and selects
[ used dfos tools ] dfos tools finishNight
[ output used by ] output updated daily and monthly QC statistics; aggregate numbers for option -i
[ upload/download ] upload/download none

extractStat

Note: before v3.0, the 'ALL' content in daily_stats was calculated from the modes. If the mode configuration was incomplete, the unconfigured modes were never counted. This systematic error is eliminated with v3.0 where the mode distinction has been terminated and ALL files are properly counted.

Description

This tool is used to extract statistics for the daily DFO workflow and to create monthly overviews of the statistics. In its standard mode (extractStat -d), it is usually called from finishNight. There is also an interactive mode (extractStat -i) which can be invoked for quick queries, and the mode -I (extractStat -I) for statistics overview for 3 month intervals.

The tool extracts statistical data for the following metrics:

With version 3.0, file-related parameters come only for the whole instrument, not per user-specified instrument mode (e.g. DET, IMG, LSS, IFU etc.) like before.

The tool is set up such that it overwrites existing entries for a given date. It can be invoked multiple times for a given date. The filling of entries is usually done in the background, with finishNight in which extractStat is wrapped.

All entries are written into the QC1 database tables daily_stat and monthly_stat. The QC1 database tables are visible under http://www.eso.org/qc/WISQ/QC1_DB_wisq.html .

The report mode is called interactively by 'extractStat -i'. There are 2 different kinds of reports:

option report content
1 monthly report all data aggregated for the specified month
2 trimonthly report selected parameters for three months

The results are read from the database and identical to the ones which you would see if you use the DFO database access interface, http://www.eso.org/qc/WISQ/QC1_DB_wisq.html. These options are offered for convenience only.

File size and numbers.
Raw files:
Number of raw frames are extracted from executed ABs which effectively means that only raw frames are counted which have been processed. Their size is extracted from the ngas database. This mechanism does not require raw fits files physically stored under $DFO_RAW_DIR.

Product files: If products have already been ingested (the tool evaluates $DFO_MON_DIR/DFO_STATUS file for flags cal_Ingested and sci_Ingested), their number is read from ABs, and their size is read from the ngas database. In that case, no fits file is required to be physically stored under $DFO_CAL_DIR.

If products have not been ingested, their number and size are extracted from the content of $DFO_CAL_DIR. Then, any file not being present there is not counted.

ABs: AB numbers are counted in the $DFO_LOG_DIR tree (the final storage, after executing moveProducts). Two parameters are measured: the EXECUTED ABs (the ones successfully processed), and the created ABs. Their difference effectively counts the number of science ABs (plus a small bias introduced by unsuccessful ABs). Although not processed currently, the science ABs have a quality in themselves since they are stored in the calSelector database. (This was different before period 88, 2011-10, when CALIB and SCIENCE ABs were both executed and accounted.)

Note that we effectively count AB_detector jobs, i.e. one AB per detector. This is trivially the case for instruments like FORS2 or VIMOS (having one raw file per detector and therefore always one AB per detector). For instruments like CRIRES, HAWKI or VIRCAM, there is the configuration key MEF_FACTOR. It either takes into account that the AB is split during execution time into MEF_FACTOR jobs (this is the case for VIRCAM, MEF_FACTOR=16, and OMEGACAM, 32), or artificially accounts for a proper normalization of the N_AB parameter if the AB is executed sequentially for all detectors (like for CRIRES, MEF_FACTOR=3, and HAWKI, MEF_FACTOR=4).

The execution time for ABs is measured by adding up the TEXEC values from the ABs. The execution time for QC reports is calculated in a similar way, using TQCEXEC.

Output

Statistics is writen into the QC1 database tables daily_stat and monthly_stat. The local tables $DFO_MON_DIR/STATISTICS_DAILY acts as a backup repository.

How to use

Type extractStat -h for on-line help, extractStat -v for the version number,

extractStat -d <DATE>

to extract statistics for <DATE> (with update of graphical reports),

extractStat -i

to generate reports for the monthly overview per instrument,

extractStat -I

for the 3 monthly report for ALLinstruments.

Configuration file

config.extractStat defines:

Section 1: general
TOOL_MODE AUTO | INTER INTER: ask for confirmation before writing statistics
AUTO: do not ask for confirmation
MEF_FACTOR e.g. 1 or 4 for MEF instruments: multiplex factor, to provide correct N_AB; default: 1
Section 2: instrument modes
Obsolete with v3.0

Parameter definitions

The following parameters are derived by extractStat and inserted into daily_stat, per instrument mode:

Column name Description Format Example entry
civil_date DFO date (year-month-day) YYYY-MM-DD 2005-02-09
instrument Instrument name char UVES
instr_mode always ALL (was: instrument mode) char DET
N_ACQ_RAW Number of (raw) acquisition frames processed integer 23
MB_ACQ_RAW Total size (in Mbytes) of acquisition frames processed float 0.21
N_CAL_RAW Number of raw calibration frames processed integer 103
MB__CAL_RAW Total size (in Mbytes) of raw calibration frames processed float 660.1
N_CAL_PRO Number of calibration products created integer 55
MB_CAL_PRO Total size (in Mbytes) of calibration products float 105.3
N_SCI_RAW Number of raw science frames processed (= 0 with QC XXLight) integer 23
MB_SCI_RAW Total size (in Mbytes) of raw science frames processed (= 0 with QC XXLight) float 276.4
N_SCI_PRO Number of science products created (= 0 with QC XXLight) integer 46
MB_SCI_PRO Total size (in Mbytes) of science products created (= 0 with QC XXLight) float 55.9
N_AB Number of det.ABs created (CAL and SCI) (det.ABs: detector ABs, i.e. N_AB times mef_factor as configured integer 26
N_EXEC Number of successfully (pipline-) processed det.ABs integer 22
T_AB_EXE Total pipeline execution time (in minutes) for processed det.ABs float 8.5
T_QC_EXE Total QC report execution time (in minutes) for processed det.ABs float 5.6

The database table daily_stat has one row per instrument mode (only ALL after 2013-03) and day. One row with mode = ALL_INS_SUM is added with sums over all modes.

The table monthly_stat has the same column names and formats as daily_stat, plus summary values N_ALL_RAW (=ACQ+CAL+SCI), N_ALL_PRO (=CAL+SCI), GB_ALL_RAW (=CAL+SCI), GB_ALL_PRO (=CAL+SCI). Total file sizes are in GBytes (instead of MBytes), AB and QC execution times are in hours.

Operational aspects

Implementation of statistics

The primary statistics source are ABs within the $DFO_LOG_DIR/<DATE> directories. As a general rule, each raw or product file is counted only once. The following table summarises the implemented rules for counting files and ABs:

Type Rule
Acquisition data was counted before v2.2, not anymore
Raw calibrations Every raw CALIB frame from RAWFILE section in an executed calibration AB is counted (before 2011-10-01: every associated and packed raw file)
Calibration products Every ingested CALIB product is counted; if not yet ingested, every product file under $DFO_CAL_DIR (no matter if fits or hdr) is counted
Raw science data Every raw SCIENCE frame from RAWFILE section in an executed AB is counted (before 2011-10-01: every associated and packed raw file)
Science data products none (if existing, every ingested SCIENCE product would be counted)
Association blocks All ABs in $DFO_LOG_DIR/<DATE> are counted and multiplied by MEF_FACTOR; ABs with PROCESS_STATUS!=CREATED are counted as pipeline executed; execution times are collected from the TEXEC key in the ABs

Last update: April 26, 2021 by rhanusch