Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO

make printable

New:

See also:

v1.2.4:
- option -M introduced; if set then do not replace M. names with ORIGFILE in calSelector output.

tools related to calSelector:
calselManager | createCalibMap | verifyAB | writeBreakpoint

general overview of calSelector here

OCA: rules | syntax

verifyAB calls are managed by dfoMonitor

databases	none
dfos tools	called by dfoMonitor
output	xml file, txt file per AB
upload/download	xml\|txt exported to qcweb: <instr>/logs/CALSELECTOR/<date>

verifyAB

Description

The tool verifyAB compares the content of a SCIENCE AB, as created with the OCA rules in the DFOS workflow, to the results of the dynamic association of calSelector (Raw2Master mode, R2M).

The tool has two main modes: DATE and FILE.

In DATE mode, the tool is used in routine dfos operations for science files to check their associations for completeness. In that mode, there is no detailed comparison to the AB results (for performance reasons), but an automatic evaluation for completeness which is turned by getStatusAB into a colour scheme (red or green) for SCIENCE ABs on the AB monitor. The tool auto-detects missing associations in a way similar to the old (de-commissioned) harvestAB tool.
In FILE mode, the tool is used for detailed checks. It is fed with a dp_id, and the calSelector associations are compared to the AB. Effectively this is a comparison of the OCA rules and the CALSELECTOR rules and can also be used for testing or debugging of new OCA rules.

The core command in both modes is 'verifyAB -f <dp_id>': a file name is offered to calSelector which then uses the appropriate OCA rule to find all associations. Then, the list of raw files (RAWFILE and RASSOC sections of the AB) and master calibrations (MCALIB, MASSOC) is extracted from the calSelector result and from the AB, and compared in a DIFF file, for file names and validity.

If SUPPRESS_EXT is configured, then all ABs with the specified extensions are not used for the DIFF comparison. This might be useful if a complex science cascade exists that goes beyond what calSelector can define as the (starting) set of ABs (like e.g. step2 ABs).

There is also the option to call calSelector in Raw2Raw mode. Use the flag -R for verifyAB to test results for this mode. See here for more.

Note that nothing is stored in a database. The result files (xml and txt) are only stored locally for latter reference. All calSelector results are dynamic.

The DATE mode is typically called from the dfoMonitor as part of the SCIENCE workflow, but can also be called interactively. If called by the dfoMonitor, then the workflow step for AB creation is executed first (createAB), then the corresponding call for calSelector, then getStatusAB with the color-coded results. At that stage the user is asked for review. If acknowledged, the tool continues in the usual manner.

The FILE mode is called only from the command-line.

For more information regarding calSelector check out here.

FILE mode: testing the rules

The FILE mode of verifyAB (meaning any call 'verifyAB -f <...>') is effectively a comparison of the operational OCA rule (DFOS_OPS) and the CALSELECTOR rule. A SCIENCE raw file is exposed to the CALSELECTOR rule in the database, or to a local OCA rule that you want to test. The calSelector xml file is then compared to the AB as created by createAB.

There are several ways of running the FILE mode. Each of them can be employed with specifying the local OCA rules you want to test, or without specifying, then the applicable OCA rule set is used that is stored in the database.

1. Call

verifyAB -f <dp_id> [-o <pathname to local OCA rule>]

if you want just that dp_id tested. The dp_id is the ARCFILE name with or without extension. The tool returns the DIFF file for that dp_id (see below).

2. Call

verifyAB -f CONFIG [-o <pathname to local OCA rule>]

and enter a dialog in which you select, from the displayed filelist, the dp_id you want to test. The config file is always $DFO_CONFIG_DIR/CALSELECTOR/config.verifyAB, there is no need to specify it. The tool returns the DIFF file for the selected dp_id (see below).

3. Call

verifyAB -f LIST -l <filelist> [-o <pathname to local OCA rule>]

which you can use for performance or regression tests (comparison of a new to an old set of OCA rules). All files from the specified filelist are exposed to calSelector. The tool returns one txt and one xml file per dp_id. All un-commented lines in the filelist are interpreted as dp_ids, first column only. You need to specify the full pathname of the filelist.

The difference between CONFIG and LIST is that for CONFIG you would like to have the file config.verifyAB filled with typical examples for all possible cases, modes, raw_types etc. For LIST, you might want to have a longer filelist, maybe with redundant test cases if you want to monitor performance, or you could design a reference filelist in each of your $DFO_CONFIG_DIR/CALSELECTOR/<date> directories which you use for regression tests in case of modifications.

The LIST mode is also implicitely used by the tool running in DATE mode.

DATE mode: verify associations

The DATE (or VERIFY) mode is a generalization of the FILE mode. For a given date, all science dp_ids are collected in a list, and that list is handed over to the tool.

In this mode, the tool does not create DIFF files but automatically evaluates whether the result xml files are complete or not. This flag is written into a text file in $DFO_MON_DIR and interpreted by getStatusAB for the automatic flagging.

The DATE mode is designed for the science part of the daily workflow and is called within the corresponding workflow on the dfoMonitor. You can also call it on the command line:

Call

verifyAB -d <date>

for verifying all science ABs in $DFO_LOG_DIR/<date>, and

verifyAB -d <date> -P

for all science ABs in $DFO_AB_DIR for that date (this is the way how the tool is called in the daily workflow).

The output txt and xml files are stored on the QC web server, together with the ABs, for further reference, e.g. for the IDPs or for the OPSHUB.

Special flags.

-R --> Raw2Raw. You can run the tool with the special flag -R to force it to go Raw2Raw. This is supported in both modes (DATE or FILE). This flag is useful to check calSelector results in a controlled way (remember that by default the tool tries to go Raw2Master but switches to Raw2Raw in case of incomplete results, or in case of the Raw2Raw mode forced in the OCA rules).

-i --> ignore-certified. Remember that in Raw2Raw, calSelector by default gives preference to the closest-in-time certified raw calibrations. For very recent science data this may mean that those certified calibrations are already 10 days old, while younger ones exist from today, which however are not yet certified (meaning they have not yet master calibrations). With the additional flag -i you can force the tool to ignore this 'certified' flag and search for calibrations with preference to closest-in-time, no matter if certified or uncertified. This flag is therefore relevant for "PI-like" searches, looking for calibrations for the latest science files.

-D debug mode. If you need the calSelector log for debugging purpose (since you want to create a JIRA ticket), call the tool with option -D which will call calSelector with the debug log.

-M . If set then do not replace M. names with ORIGFILE in calSelector output. This is useful for master calibrations which have been ingested using qcFlow.

Output

The output of the test is written into $DFO_AB_DIR/CALSELECTOR. All files there have the dp_id as root, with the exception of the file SDIFF.html which offers a convenient browser refresh: once your browser is on SDIFF.html and you test another dp_id, you can see the result by just refreshing the browser.

There are the following <dp_id> files:

<dp_id>.txt and <dp_id>_stat.txt: the calSelector result file and the AB result txt file.
<dp_id>.sdiff.html: the output of the comparison (called sdiff for historical reasons since once the UNIX command sdiff was used)
<dp_id>.xml: the xml file created by the calSelector association
<dp_id>.log.html: the log file of the calSelector association, in html format
the AB and the ALOG as copied from $DFO_LOG_DIR/CALSELECTOR/<date>.

DIFF file (SDIFF.html)

The main output file for investigation is <dp_id>.sdiff.html, also copied as SDIFF.html and linked from the dfoMonitor. It comes in two tables side by side: the AB table (left), and the calSelector table (right). Green lines mean "identical content", they are listed for the left table only. Lines with dp_ids missing in either the right or the left table display in red. Lines with the same dp_id but with different content (flag true vs. false, for instance) display in light-red. 'Main' marks direct associations (the MCALIB ones). The DIFF file is an enhanced table, with intuitive sorting and filtering features.

In DATE mode, the txt and html output is stored in $DFO_LOG_DIR/CALSELECTOR/<date> and also copied to qcweb, into the tree logs/CALSELECTOR/<date>. In FILE mode the output is only offered for inspection but not stored.

output file content comment

xml Full association content: all required mcalibs and gencalibs for science AB if 'Raw2Master' mode was successful; if not, then all certified raw files and gencalibs in 'Raw2Raw' mode, including dependent ABs.
in the XML file, click on the '-' symbol to close (and on '+' to open) a section like 'associatedFiles' or 'mainFiles'

txt same content
same content as xml file, but in a better readable multi-column format

col#1: SCIENCE or CALIB
col#2: dp_id of the raw file or the gencalib file
col#3: DO.CLASS or PRO.CATG
col#4: completeness: 'true' for 'complete', or 'false' for 'incomplete'
col#5: type ('main' or 'auxiliary'; if required for processing: main)

Configuration files

The following config files are related to verifyAB, all reside under $DFO_CONFIG_DIR/CALSELECTOR:

CSConfiguration.properties: calselector configuration, mandatory
config.verifyAB: optional tool configuration, hosting dp_ids for testing the CALSELECTOR OCA rules.

If config.verifyAB exists, you can use it in the dialog 'calselManager -f CONFIG'. If not, you can still call 'calselManager -f <dp_id>' or 'calselManager -f LIST -l <filelist> .

The configuration file config.verifyAB has the following content:

1. General parameters
ENABLE_60	NO\|YES	If YES, accept science ABs with run_ID 60./060.x/0060.x
SUPPRESS_EXT	tpl\|ext\|cpr	optional key; ABs with one of the configured extensions are suppressed (separate them by \|, no spaces)
2. FILE IDs for testing
tag FILE_NAME	dp_id (arcfile name without .fits)	&&comment&&
e.g.:
FILE_NAME	GIRAF.2011-11-26T00:26:40.269	&&Medusa with FF 3days old (dynamic is better)&&
FILE_NAME	GIRAF.2011-12-04T05:20:18.441	&&Medusa: OK&&
Any line starting with FILE_NAME is displayed.

A few hints how to fill config.verifyAB properly:

have at least one dp_id for each of the raw_types defined in config.createCalibMap
add more dp_ids if they relate to typical situations (e.g. one with an optional match, one without)
if you have different versions valid for different periods, cover all of them.

The file CSConfiguration.properties has the following keys:

$DFO_CONFIG_DIR/CALSELECTOR/CSConfiguration.properties

#This file has some database configuration:

spring.datasource.url=jdbc:sybase:Tds:acdb.hq.eso.org:2025

spring.datasource.username=qc

spring.datasource.password=...

spring.datasource.driver-class-name=com.sybase.jdbc4.jdbc.SybDriver

How to use

Type

verifyAB -h | -v

for help and version of verifyAB;

verifyAB -H | -V

for help and version of calSelector.

In DATE mode (usually used by the dfoMonitor wrapper only), call

verifyAB -d 2014-12-30

to verify the SCIENCE ABs for the specified date (if the ABs are taken from $DFO_LOG_DIR, i.e. are already distributed), or

verifyAB -d 2014-12-30 -P

to do the same but the ABs are still in $DFO_AB_DIR.

In LIST mode, call

verifyAB -f <dp_id>

for testing the associations for dp_id, using the database rules (for the other related options see DATE mode).

If you want to force the tool to go Raw2Raw, call it like

verifyAB -f <dp_id> -R

verifyAB -d 2015-06-30 -R.

You can also force calSelector to ignore the certification flag (makes sense in connection with -R only):

verifyAB -f <dp_id> -R -i.

Call

verifyAB -f <dp_id> -o <pathname to local OCA rules>

for doing the same with a local OCA rule, for testing or debugging; call

verifyAB -f <dp_id> -D

for calling calSelector in DEBUG mode (with the output useful to debug errors).

The DFO flag 'sci_Verified' is set by the tool dfoMonitor (not by verifyAB itself) as part of the workflow wrapper for science ABs.

Installation

The tool is installed with dfosInstall.

Operational hints.

Gencalib not archived. You can find all instances of a static (general) master calibration in the archive by calling 'productExplorer -d <very old date> -D <current date> -p <pro_catg>'. The result dataset may include the same file twice or several times, with different names. This would be caused by the transition from a naming scheme defined by the pipeline developer to the general QC naming scheme. This is not an issue.

No instance of a requested gencalib exists in the archive. Then the calSelector request will be incomplete and go Raw2Raw. This should not happen since by now such cases would already be known from the R2R version of calSelector. If you find such a case, the usual procedure is to identify the file in your $DFO_CAL_DIR/gen and ingest it. You can then confirm the successful solution by calling verifyAB again.

Files with no mjd_obs. For the earliest phase of VLT operations, there exist files without tpl.start information. Those OCA rules using the TPL.START key will fail for those files. Best strategy is to design the validity of the calSelector rules such that those very early phase are avoided.

Last update: April 26, 2021 by rhanusch

Common DFOS tools: Documentation