Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO
*make printable new: see also:
  - v1.3: FILE_MODE always HDR
- find here workflow description to correct headers
OCA syntax
   
[ used databases ] databases none
[ used dfos tools ] dfos tools uses ABbuilder; call embedded in createAB
[ output used by ] output cleaned $DFO_HDR_DIR and $DFO_RAW_DIR
upload/download email to configured address with file report

 

filterRaw

Description

This tool analyses the headers of raw files for unusual entries. The purpose is to automatically detect anomalies in the meta-data of the data stream (FITS header), find files which go unclassified with the current OCA configuration, and filter data with unwanted properties.

Three different kinds of file attributes can be detected by the tool:

The tool uses the DFS tools ABbuilder for classification, and fitsreport for reporting. It can be run in three different modes:

SCAN. In this mode, all headers for the specified date are scanned. The tool creates ABs for them, using the standard OCA classification configuration for the daily workflow. However, the association and organisation configuration files are replaced by dummies, so that the created ABs cannot be used operationally (they are stored in $DFO_AB_DIR/FILTER only temporarily and are deleted after runtime). The RAWFILE section of all created ABs is then scanned, to obtain the set of classified raw files. This is checked against the list of all raw files. The difference is the set of UNCLASSIFIED raw files. This set, if not empty, is displayed and fitsreported. No specific configuration is required for this first step.

Then, the header data are analyzed for empty FITS keys which should not be empty. The user defines in a fitsreport configuration file which keys should not be empty. Then fitsreport is executed against the input files. All files returning an empty key are reported as EMPTY files.

These actions are performed on headers, meaning they can normally be executed as soon as new headers have been downloaded. Any anomaly found can then be investigated, reported to Paranal, or to dbcm. This would give an opportunity to fix the database entries so that the fits files, to be downloaded later, can already be fixed.

FILTER. In this mode the tool  scans the headers of the files. It is intended to detect files with certain properties and remove them from  the standard processing workflow. The ${FILES_MODE} parameter  is read from the config.createAB and depending on its value the headers (if FILE_MODE=HDR) or headers and fits files (if FILE_MODE=FITS) are hidden in the ${HIDE_DIR}/$DATE. The filter mode can handle two kinds of properties:

RAW_MINNUM (minimum number of raw files) is an optional classification criterion which is, strictly speaking, not OCA-compatible: it is not based on FITS key content, and cannot be expressed in OCA syntax. Nevertheless it is operationally important. It is therefore configured in the standard ${DFO_INSTRUMENT}_organisation.h as line

//RAW_MINNUM <raw_type> <value>

This line can be put anywhere in the configuration file. For good reasons it is usually configured in the same <raw_type> section as the OCA rules.

If found, the <value> is checked against the actual number of raw files in the (dummy) AB. If that number is lower than the threshold, the headers, or headers and raw files, are hidden. Hence, when it comes to creation of real ABs by createAB, they are no longer visible and don't disturb the workflow (no ABs, no VCALs). Find more here.

${RLS_CONFIG} is configured in the tool configuration (see below). It contains, in OCA syntax (to be ready for ABbuilder execution, no macros, no gcc pre-compilation!), a set of OCA 'if' statements, like

if ( DPR.CATG=="CALIB" and DPR.TYPE=="LAMP,FLAT" and TPL.NEXP==1)

or

if (rule set #2) or if (rule set #3) etc.

and some dummy entries to make this an OCA-valid statement. The user may list arbitrary rules (based on fits header keys), connected by 'or'. All files matching these rules will either be listed (by default), or listed and hidden (option -H). 'Listed' means a short fitsreport is created, and data are linked to a user-configured $HIDE_DIR/<date>. 'Hidden' means they are moved to $HIDE_DIR/<date>, thus removing them from the daily workflow. These data are finally deleted by finishNight. Note that with version 1.1, both fits and header files are moved to $HIDE_DIR/<date>, to provide a consistent handling for both the fits and hdr file_mode of createAB. If the filterRaw call is embedded in createAB (which is the normal case) then that tool takes care of moving the headers back into the $DFO_HDR_DIR/$DATE at the very end. Thereby the content of the header directory is always complete and consistent.

If FILTER_MODE is configured as "INV_FILTER", the rules are actually used in the reverse sense: all configured rules are applied to find files which are *not* hidden, all others are hidden. This makes sense when e.g. for spectroscopic modes the number of standard setups is small and well-defined while the total possible number of setups may be too high to configure.

Report. The execution report is written as "filter_${DFO_INSTRUMENT}_<date>.txt" into $DFO_LST_DIR.

Hiding. If files have been hidden, a file SOME_FILES_HAVE_BEEN_HIDDEN is written into the original directory, this to indicate that the directory content is not complete anymore.

Workflow for fixing header problems. One of the main purposes of the tool is to detect header problems. If you know the fix (the correct key values), it is extremely useful to get this correction into the header database (SAFIQ). The workflow is:

detect header problem suggest correction to dbcm@eso.org wait for confirmation email update header
filterRaw hideFrame   ngasClient -H <hdr>

How to use. Type filterRaw -h | -v for on-line help and version;

filterRaw -d 2025-12-30 -m SCAN

to scan the header data for date 2025-12-30, and

filterRaw -d 2025-12-30 -m FILTER -H

to filter the fits files and hide them if any matching files are found.

Below is an overview of the different possible modes.
mode data source properties checked action place in workflow
SCAN hdr files in $DFO_HDR_DIR/<date> UNCLASSIFIED list; email if configured after header download, together with createReport and checkConsist
  same EMPTY list; email if configured same
FILTER fits files in $DFO_RAW_DIR/<date> FILTER list, hide if configured; email if configured after fits file download and checkDownload, as part of createAB
ALL SCAN plus FILTER

Installation

Use dfosExplorer, or type dfosInstall -t filterRaw .

The tool requires the DFS tools ABbuilder and fitsreport.

Configuration files

All configuration goes to $DFO_CONFIG_DIR/OCA: config.filterRaw (tool configuration), filter_$DFO_INSTRUMENT.RLS (the OCA configuration), and the fitsreport cfg file(s) (up to three are possible).

config.filterRaw

The file config.filterRaw is the tool configuration:

Section 1
TOOL_MODE INTER | AUTO INTER: interactive; AUTO: automatic mode
HIDE_DIR /data03/data/HIDE root for directories with hidden files
EMAIL_NOTIF YES | NO YES: reports are sent to $OP_ADDRESS
FILTER_SWITCH FILTER | INV_FILTER FILTER: configured filtering rules are used to select and hide files
INV_FILTER: inverse behaviour (configured rules are used to find files *not* to be hidden, all others are hidden
Section 2
OCA configuration (for FILTERing only)
RLS_CONFIG filter_<ins>.RLS

specify here the name of the OCA config file for filtering, in compiled OCA syntax

Section 3
fitsreport configuration

FITS_REPORT_UNCLASS e.g. fitsreport_unclassified.cfg names of fitsreport config files for the three reports (could be the same)
FITS_REPORT_EMPTY e.g. fitsreport_empty.cfg  
FITS_REPORT_FILTER e.g. fitsreport_filter.cfg  

fitsreport*.cfg

These files have the structure of fitsreport config files. Check out here for details.

filter_<ins>.RLS

See the example being delivered with the package.

Operational aspects

Workflow description

Option SCAN:

1. Check if data have already been hidden; warning message if yes

2. scan header files for UNCLASSIFIED files:

2.1 call ABbuilder with stripped-off standard configuration to classify files, create dummy ABs
2.2 extract RAWFILEs from created ABs
2.3 check against list of all raw files, difference is list of UNCLASSIFIED files
2.4 run fitsreport on these files (configured under FITS_REPOPRT_UNCLASS) to display their properties

3. scan header files for files with empty keys (EMPTY)

3.1 use fitsreport to check for empty keys (configured under FITS_REPORT_EMPTY)
3.2 report the ones found

4. Exit

Option FILTER:

1. scan headers to find RAW_MINNUM violations:

1.1 call ABbuilder with stripped-off standard configuration to classify files, create dummy ABs
1.2 check ABs against configured RAW_MINNUM values
1.3 extract RAWFILE names from ABs which violate the threshold, move these file to configured $HIDE_DIR/<date>

2. filter headers, or headers and fits files, with configured properties (FILTER)

2.1 call ABbuilder with OCA configuration filter_<ins>.RLS to classify
2.2 extract the classified files from the ABs
2.3 if option -H has been used, move these files to configured $HIDE_DIR/<date>, and the corresponding headers as well
2.4 create a report for filtered files

or (INV_FILTER)

2.1 call ABbuilder with OCA configuration filter_<ins>.RLS to classify
2.2 extract the classified files from the ABs
2.3 move all other files to configured $HIDE_DIR/<date>, and the corresponding headers as well
2.4 create a report for filtered files

3. write report $DFO_LST_DIR/filt_${DFO_INSTRUMENT}_<date>.txt

How to embed in regular workflow

Call 'filterRaw -d <date> -m SCAN' as soon as headers are available.

Enable FILTER_RAW in config.createAB to YES, this will call 'filterRaw -d <date> -m FILTER -H' just before operational ABs are created.


Last update: April 26, 2021 by rhanusch