Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO
*make printable new: see also:
 

v3.0.4:
- new optional CLEANUP_PLUGIN for cleanupProducts

call_IT
 

v3.1:
- enabled for DEEP mode

v3.2:
- enabled for IDP updates
- trying up to 3 times to ingest the same file if there are NGAS errors

call_IT v1.4.2:
- updated syntax and config file for ingestiontool

Make sure that during IDP ingestion, no other major processes run on muc08/muc10/muc11, at least for XSHOOTER and MUSE IDPs. Otherwise the ingestion for some of them might fail because of NGAS resource bottlenecks.
[ used databases ] databases phase3 for IDP ingestion; qc_products and qc1 for master calibrations
[ used dfos tools ] dfos tools for IDPs: idpConvert, uves2p3, call_IT, IngestionTool; qc1Ingest;
for master calibrations: dp
Ingest
[ output used by ] output log file list_ingest in $DFO_LST_DIR
[ output used by ] upload/download ingestion into NGAS
topics: description IDP ingestion: process | conversion | ingestionTool | errors | configuration MCALIB ingestion under PHOENIX: process | deletion | ingestion | configuration how to call | statistics&logs | operational aspects

Note:
- In this documentation, IDPs means Internal Data Products and stands for science data products as created by the phoenix process.
- MCALIBs is short for master calibrations, as created by the phoenix process.
- If nothing is mentioned in particular, the documentation applies to all kinds of products.

In some parts this documentation splits into sections applicable for the ingestion of IDPs, and others for MCALIBs. The IDP part then is shaded light blue (like this cell),

while the MCALIB part is shaded light-yellow (like here). You can then ignore the respective other part.


PHO
ENIX

ingestProducts for phoenix

[ top ] Description

The tool ingestProducts is enabled for standard DFOS and PHOENIX environments. The environment is recognized from the key THIS_IS_PHOENIX in .dfosrc which is set to YES for PHOENIX environments, and to NO for DFOS environments (NO by default).
Furthermore, if THIS_IS_PHOENIX=YES, the tool can recognize the MCALIB mode (ingestion of phoenix-generated master calibrations) via the key MCAL_CONFIG in config.phoenix.

Supported modes for ingestProducts      
environment enabled? using ... storage
DFOS_OPS      
CALIB YES dpIngest NGAS
SCIENCE NO    
PHOENIX      
CALIB (MCALIBs) YES** dpIngest NGAS
SCIENCE (IDPs) YES* IngestionTool wrapped in call_IT phase3
* enabled via THIS_IS_PHOENIX=YES in .dfosrc
** enabled via THIS_IS_PHOENIX=YES in .dfosrc and MCAL_CONFIG set in config.phoenix

In the following the details for the PHOENIX environment are described. The behaviour in the DFOS_OPS environment is documented here.


[ top ] IDP ingestion: general process

IDP ingestion means to ingest the science data products and their ancillary files into the phase3 database (for the header information) and into NGAS (for the files).

Before starting ingesting an IDP stream, the phase3 environment needs to be defined in config.ingestProducts:

Special configuration for IDP ingestion
phase 3 parameter value for UVES for XSHOOTER for GIRAFFE for MUSE for MUSE-DEEP etc.
Phase 3 programme UVES XSHOOTER GIRAFFE MUSE MUSE-DEEP ...
Phase 3 data release/stream 1 or 2 1 or 2 1 1 1  

These configuration keys are defined together with ASG.

The tool ingestProducts is effectively a wrapper that first calls a preparation tool (converter), and then the phase3 ingestion tool. The converter tool is either a dfs provided tool (for UVES: it adds header information for phase3 compliance and modifies the FITS file structure), or a shell-script customized to the instrument (adding header information for phase3 compliance). They are described in the following.

Current installation of phase 3 tools on muc08
tool function Instrument tool name
UVES converter UVES /opt/dfs/bin/uves2p3
XSHOOTER converter XSHOOTER $DFO_BIN_DIR/idpConvert_xs (for header tasks only)
GIRAFFE converter GIRAFFE $DFO_BIN_DIR/idpConvert_gi (for header tasks only)
MUSE MUSE $DFO_BIN_DIR/idpConvert_mu
MUSE-DEEP MUSE $DFO_BIN_DIR/idpConvert_mu_deep
HAWKI HAWKI $DFO_BIN_DIR/idpConvert_hi
ingestion tool  any /opt/dfs/share/IngestionTool.jar

[ top ] Conversion tool: idpConvert and uves2p3

UVES. For the UVES IDPs, there is the wrapper idpConvert around the special DFI-provided converter tool uves2p3. It is needed to transform the pipeline-delivered output files into the SDP (science data products) standard format which is a binary table for spectroscopic data. This is the task of the conversion tool. It is instrument specific and is provided by DFI. For the current installation it is called

uves2p
For convenience it is wrapped in the helper tool idpConvert.

Other IDPs. For the other IDPs, all structural conversion is done by the pipelines, and only header keys need to be added. This is done by customized header-conversion tools like idpConvert_xs or idpConvert_gi, which are created and maintained by QC. Find their description below.

Installation (UVES only!)

uves2p3 comes as part of the phase3 software installation. idpConvert is installed on sciproc@muc08:$HOME/UVES/bin.

Config file (UVES only!)

The uves2p3 tool has a config file in $DFO_CONFIG_DIR, uves2p3.cfg (note its special syntax, due to its non-DFOS nature):

Section 1: matching of pro.catg and role
The 'role' is the label of the corresponding column in the binary table. The first column is always the wavelength.
#pro.catg (note the colon at the end): label comment
[prodCatgToRole]   this has to be the first line, required by uves2p3
FLUX_CAL_ERRORBAR_BLUE: ERR  
...    
FLUXCAL_SCIENCE_BLUE: FLUX  
...    
etc.    
(all possible values of pro.catg need to be listed here, with the 'label' having reserved values determined by the IDP standard)

How to call (UVES only!)

uves2p3 is called by idpConvert, and idpConvert is called by ingestProducts. You can call idpConvert from the command-line:

Type idpConvert -h for on-line help, and idpConvert -v for the version number.

Type idpConvert -H and -V for on-line help, or version number, of the uves2p3 tool.

Call

  • idpConvert -d <date>
to convert all SCIENCE products for the specified date into IDPs
  • idpConvert -m <month>
to convert all SCIENCE products for a specified month into IDPs
  • idpConvert ... -D
run uves2p3 in DEBUG mode

IDP conversion (other instruments)

The corresponding wrapper scripts are always called like idpConvert_xs/gi etc. Their name is configured in config.ingestProducts under CONVERTER.

IDP output (all instruments)

The converted products are found in the subdirectory $DFO_SCI_DIR/<date>/conv. The converter log file (from uves2p3 or from the conversion scripts) is found in $DFO_SCI_DIR/<date>/CONVERTED. This log file is also exported to the qc@qcweb site and can be found in http://qcweb/~qc/<RELEASE>/logs/<date>/CONVERTED.


[ top ] Ingestion: call_IT and IngestionTool

The ingestion tool (call_IT as a wrapper, IngestionTool as the main component) provides the same kind of functionality as the dpIngest tool for DFOS master calibrations. It is a java package that is provided by DFI. It takes all files from a specified directory, ingests them into ngas, and extracts the header keys into the data repository and from there into the phase3v2 database. Get tool information and process documetantion by typing 'call_IT -H', or access the process documentation directly under your local file/opt/dfs/doc/ingestiontool_for_idp.txt.

For convenience the ingestion tool is wrapped in the helper tool call_IT. This helper tool and the ingestion tool itself are used in the same way for all IDP projects.

Installation

The IngestionTool comes as part of the phase3 software installation. The wrapper tool call_IT is installed on the local $DFO_BIN_DIR.

Config file

The IngestionTool tool has a config file in $DFO_CONFIG_DIR, ingestiontool.properties. It is filled and maintained by the developer.

How to call

IngestionTool is called by call_IT, and call_IT is called by ingestProducts. There is no operational need to call it from the command-line, but you could:

Type call_IT -h for on-line help, and call_IT -v for the version number.

Type call_IT -H and -V for on-line help, or version number, of the IngestionTool.

Call

  • call_IT -d <date>
to ingest all SCIENCE IDPs for the specified date
  • call_IT -m <month>
to ingest all SCIENCE IDPs for a specified month
  • call_IT ... -D
run IngestionTool in DEBUG mode
  • call_IT ... -F
call validation only
  • call_IT ... -U
enable updates: this will ingest a new version of an already existing file; see 'Operational aspects'

The tool expects the converted IDPs in the subdirectory $DFO_SCI_DIR/<date>/conv. The IngestionTool log is found in $DFO_SCI_DIR/<date>/INGESTED. This log file is also exported to the qc@qcweb site and can be found in http://qcweb/~qc/<RELEASE>/logs/<date>/INGESTED.

call_IT adds some information to the tool log file: a bit of statistics (number of new files ingested, already existing files) and of performance (time needed for ingestion). The tool performance is about 1 sec per (UVES) IDP.

[ top ] Errors and warnings

The log file of the ingestion tool checks the successful execution of the three main steps:

  • file validation (header content etc.)
  • file ingestion into NGAS
  • keyword extraction (from the repository into the phase3 database).

The IngestionTool log has also entries for each single IDP ingestion.

Note that the tool log lists every single entry in the database as a "file" which is actually wrong. If an ingested fits file is registered in 5 other fits files as ANCILLARY file, each such record is counted as a "file" by the tool. This blows up the statistics in the log file. Don't get confused!

With the ingestion tool version 4.0.3, there are some WARNINGs which appear serious but can be ignored. Don't get confused. You can check for them by typing call_IT -V:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (jar:file:/opt/dfs/bin/IngestionTool!/BOOT-INF/lib/groovy-2.5.13.jar!/) to method java.lang.Object.finalize()
WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

[ top ] Configuration of ingestProducts for IDP ingestion: special config keys for phoenix

The tool uses the standard DFOS config.ingestProducts. For the PHOENIX environment, there exist the following special keys:

Section 1: general
# special config keys not needed for DFOS, only for phoenix:
PATH_TO_IT $HOME/bin/call_IT  
CONVERTER $HOME/bin/idpConvert_xs  
ENABLE_UPDATES NO YES|NO: optional key to enable updates for IDPs (default is NO).
JAVA_HOME /opt/java11 points to the current java installation, might change in the future!
PROGRAM_NAME MUSE-DEEP phase3 program name
CLEANUP_PLUGIN pgi_phoenix_MUSE_cleanup optional plugin for special tasks of cleanupProducts
MAX_SIZE 100 optional key to set maximum size for products (to avoid NGAS issues)

In PHOENIX mode the tool reads a few configuration keys from config.phoenix:

config.phoenix
RELEASE <PROC_INSTRUMENT>, e.g. UVES <RELEASE>, e.g. UVESR_2 read for updating the statistics in daily_idpstat
INSTR_MODE <PROC_INSTRUMENT>, e.g. UVES <INSTR_MODE>, e.g. UVES_ECH read for updating the statistics in daily_idpstat

[ top ] MCALIB ingestion under PHOENIX: general process

phoenix supports not only the creation of IDPs but also of master calibrations. This process has many similarities with the IDP production: it is project driven (not bound nor triggered by daily operations), and comes as a batch (many processing jobs). The main difference to the IDP production is that the (selected) pipeline products are ingested without modifications, do not constitute a phase 3 project (i.e. do not require coordination with ASG), and do not constitute a stream. Otherwise many aspects of their production and ingestion are very similar to the production and ingestion of operational master calibrations. In particular, the underlying ingestion tool (dpIngest) and ingestion storage (NGAS) is exactly the same. Once ingested, phoenix-created master calibrations are identical to the ones created by the daily workflow. Their main motivation comes from reprocessing after pipeline changes or improvements.

Master calibrations are ingested as they are created by the pipelines. Hence no conversion is needed. Nevertheless the ingestion process has two steps, in formal analogy to the IDP ingestion:

  • the deletion of previous instances
  • the ingestion.

Before calling the ingestion, the phoenix tool already does a check for the proper file names to be used upon ingestion (see there).


[ top ] Deletion of previous instances

Depending on the configuration, the tool ingestProducts will decide before the ingestion if any pre-existing master calibrations should be deleted.

Per default, only those master calibrations get deleted and overwritten which have a new instance (by name). This might however result in an unwanted mix of old and new master calibrations. In the operational environment, many if not all calibrations are processed and ingested, no matter if actually used for science reduction:

  • the calibration stream contains a mix of HC calibrations and the ones needed for science;
  • some data types are needed for maintenance only;
  • in the early days calibrations were processed for SM data only but not for VM.

Therefore it might be reasonable to not only overwrite older instances but delete the ones which get no new version.

The tool supports this by configuration. The user may want to decide to delete (hide) certain master calibrations always, no matter if they get replaced or not.

Several cases could occur:

  • The reprocessing covers a particular ins.mode and others not. The tool then needs to know what to do with those modes which won't be automatically replaced by new versions of master calibrations: hide them anyway, or leave them. The correct strategy depends on the circumstances of the reprocessing: are the other (not replaced) master calibrations compatible with the pipeline? Is their quality still acceptable?
  • The reprocessing is motivated by the science reduction strategy, the goal is to deliver correct master calibrations with calSelector to reduced science data. Therefore it might be reasonable to focus the reprocessing on those calibrations that are needed for science reduction, and ignore/delete the others. For that purpose, the tool can be configured by PRO.CATG to be deleted, without a replacement.
  • Static calibrations should never be deleted since they cannot be reprocessed. They can be protected if their PRO.CATG is not configured for deletion.

In general, only those pre-existing master calibrations get deleted which are configured as PHX_DELETE.

The deletion of pre-existing master calibrations is a critical step and can be fine-tuned by calling ingestProducts in DEBUG mode, which is interactive and offers file lists for review before actual hiding. The listings are done for the following cases:

  • hidden by configuration (pro.catg and ins.mode) but not replaced
  • NEW master calibrations without previous instance
  • unchanged files (not hidden and not replaced)
  • replaced files.

[ top ] Ingestion of master calibrations

The tool first creates these lists and executes the file DELETEs, calling dpDelete in -force mode. Then, it calls dpIngest in the usual way (as for daily operations). All actions (deletion and ingestion) are listed in the standard list_ingest_CALIB file. If configured, the qc1_update part is executed, using the QC1 database tables names if configured in the section QC1_TABLE of the configuration file.

[ top ] Configuration of ingestProducts for MCALIB ingestion: special config keys for phoenix

The tool uses the standard DFOS config.ingestProducts. For the PHOENIX environment, there exist the following special keys:

Section 3: PHOENIX deletion configuration for MCALIBs
# special config keys not needed for DFOS, only for PHOENIX (multiple lines supported; comma-separated list for INS_MODE supported)):
PHX_DELETE PRO_CATG INS_MODE # comment
PHX_DELETE FF_EXTERRORS MED,IFU # any master calibration with that PRO_CATG and INS_MODE=MED or IFU gets deleted; one for another INS_MODE does not get deleted
PHX_DELETE FF_EXTERRORS ANY # any master calibration with that PRO_CATG gets deleted, no matter what INS_MODE it has (also includes NULL values!)
PHX_DELETE FF_EXTERRORS MED,IFU,ARG # any master calibration with that PRO_CATG and INS_MODE gets deleted (excludes NULL values!)
PHX_DELETE FF_EXTERRORS NULL # only master calibration with that PRO_CATG and INS_MODE=NULL gets deleted
       
Section 4: PHOENIX definition of QC1 tables with content from reprocessing (if any): list of QC1 tables affected by this PHOENIX project, to be updated with the origfile name; in case of doubt call 'qc1Ingest -instrume $QC1_INSTRUMENT'
QC1_TABLE qc1_giraffe_wave_reproc #name of table (multiple lines supported)

[ top ] How to call

To call the tool in PHOENIX mode for IDPs, make sure to call it in the PHOENIX environment: $THIS_IS_PHOENIX must be YES. This is controlled in $HOME.dfosrc.

The ingestProducts tool is called in the usual way, with specifying -d <date> and -m <mode>. In addition, you can call the flags

-f: call fitsverify only and exit;

-U: ingest updated IDPs with call_IT.


To call the tool in PHOENIX mode for MCALIBs, you must

  • make sure to call it in the PHOENIX environment: $THIS_IS_PHOENIX must be YES;
  • Furthermore, enable the key MCAL_CONFIG in config.phoenix, to point to the additional config file used to define the specific PHOENIX MCALIB project and distinguish it from an IDP project (see phoenix);
  • you must also define a resource file $HOME/.dfosrc_X to contain the environment for that project; this is important if you have an IDP project under the same account and you need to distinguish e.g. $DFO_MON_DIR for these two PHOENIX projects; otherwise you can just copy it from the existing $HOME/.dfosrc.

You can call the DEBUG mode of ingestProducts in the PHOENIX MCALIB environment:

ingestProducts -m CALIB -d <date> -D

The tool will then ask you for confirmation about the important steps of instance deletion (used only for MCALIB environment).

Any other call mode is documented in the main page.


[ top ] IDP statistics and logging

The tool writes (both for IDP and MCALIB mode) into the statistics file $DFO_MON_DIR/PHOENIX_DAILY_<RELEASE>, updating the columns for number and size of ingested IDPs. It also calls qc1Ingest of those entries into the DFO database table daily_idpstat and monthly_idpstat (see also the WISQ workflow statistics). For MCALIBs, the corresponding parameters need to be interpreted as applicable for MCALIB products. Since the focus of the QC1 tables is to monitor the creation and ingestion process (in terms of performance, disk space etc.), this mixing-up of IDPs and MCALIBs seems justified.

The last execution of the tool is written into the log file $DFO_LST_DIR/list_ingest_SCIENCE_$DATE.txt. All executions of the tool are logged into $DFO_SCI_DIR/<date>/INGESTED which is also exported to qcweb as http://qcweb/~qc/<RELEASE>/logs/<date>/INGESTED.

The monitor tool phoenixMonitor displays whether or not a certain night with IDPs has already been converted and ingested, by checking for the files $DFO_SCI_DIR/<date>/CONVERTED and INGESTED. For MCALIBs, the tool checks if the files have been properly renamed and ingested.

[ top ] Operational aspects

For IDPs:

  • Make sure that during ingestion, no other major processes run on muc08/muc10/muc11: at least for XSHOOTER and MUSE IDPs it might be true that otherwise the ingestion for some of them might fail because of NGAS resource bottlenecks.
  • The ingestion tool has the following main steps:
    • "release validation" - meaning a consistency check that in the directory to ingest ($DFO_SCI_DIR/$DATE/conv) *all files* exist that are listed in the ASSON<n> keys of the IDPs, and that *no files* exist in that directory that are *not* listed there. The architectur of the ingestion tool is such that it always works on a full directory and not on individual files;
    • "preparing for archive ingestion" - the ARCFILE key of the IDPs and the ancillary fits files (if any) are added, the CHECKSUM updated;
    • "file archival" - all files (fits and non-fits, IDP and ancillary) are ingested into the archive;
    • "keyword extraction" - header keys are extracted into the keyword repository.
  • Upon ingestion, the tool creates the files .batch.id and .bookkeeping.db. These files are auto-managed by the ingestion tool (don't touch them). They contain the ingestion batch ID and the entire list of files (to be) ingested.In particular these two files are used if ingestion is resumed.
  • Ingestion errors:
    • Always check the ingestion log. In many cases the reason for the error is obvious from the log file, so just fix it and retry.
    • The ingestion tool know from its bookeeping files which files are already ingested and which not, so that resuming ingestion is usually straightforward.
    • Find below some common situations and how to remedy them.
  • What to do in case of incomplete ingestion (files not ingested that should be ingested):
    • calling 'ingestProducts' again does not help since the ingestion tool assumes that all fits files in the ingestion directory have to be part of the same ingestion batch (for reasons related to the original phase3 EDP version);
    • therefore, already ingested files have to be identified (from the INGESTED log file), moved to some other place;
    • then call 'ingestProducts' for that date, with the new file(s) being ingested successfully;
    • then move the already ingested files back, to have a complete product directory.
  • What to do if file(s) have to be added (and ingested) to an already ingested directory:
    • rename the already ingested directory (like e.g. "<name>_old");
    • then call 'phoenix -r ... -M' to distribute the new files to the new directory with the proper name under $DFO_SCI_DIR;
    • ingest them from there;
    • if all is fine, then merge this folder and the old one for complete bookkeeping, meaning move all files from the old folder to the new one; also move the .batch.id file with a different name (like .batch.id.old), to keep a record of the ingestion of the files in the old directory.
  • What to do in case of incomplete renaming:
    • if config.renameProducts is incomplete, it might occur that not all fits files are renamed; then files named as 'r.<INSTR>' would be ingested
    • fix config.renameProducts; then call 'renameProducts', execute rn_files, call 'idpConvert'
    • then proceed as before ("incomplete ingestion").
  • IDP ingestion is always a 2-stage process: first there is the call of the instrument-specific conversion tool (configured as CONVERTER), then there is the call of the IngestionTool (configured as PATH_TO_IT). If you have to handle an ingestion error, you very likely need to repeat the second step but not the first one. You can handle this by switching off and on the CONVERTER key or the PATH_TO_IT key, but don't forget to configure back the normal situation eventually.
  • Call 'call_IT -h' to get some extended help about the ingestion tool.

More operational hints here.


For MCALIBs only:

  • Once ingested, the MCALIB files should eventually be replaced by their headers, using the standard DFOS tools cleanupProducts. Check the jobs file JOBS_CLEANUP.

For IDPs or MCALIBs:


Last update: April 26, 2021 by rhanusch

[ top ]