Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO
*make printable new: see also:
  v3.1:
- support for ANC files terminated
- ingestProducts
- productExplorer (to see product content of NGAS)
  v3.2:
- phase3 update supported
v3.3:
- offer CLEANUP_PLUGIN for complex PHOENIX tasks; better checks for ingestion
[ used databases ] databases qc_metadata, ngas; phase3v2 for IDPs
[ used dfos tools ] dfos tools ingestProducts
[ output used by ] output hdr files stripped-off from fits files
[ upload/download ] upload/download none

cleanupProducts

Description

DFOS. In DFOS mode, this tool replaces, per date, all archived FITS files in $DFO_CAL_DIR/$DATE and $DFO_SCI_DIR/$DATE with their header file (.hdr).

Once the CALIB product data are successfully ingested, and they are not required in any currently executing AB, there is no reason to store their pixels locally. However, complete deletion may not be wise as long as they are scanned by createAB (calibration memory). When it comes to disk space optimization, an extraction of the header information into a header file is therefore the optimal strategy. This is also done at various other places in the dfo system.

[Historically for DFOS: Although not used for processing of any AB, the disk space considerations apply to SCIENCE products as well, and cleanupProducts cleans them up in a way fully symmetric to CALIB products.]

The tool checks, before execution, that the corresponding date has already been finished (since otherwise the workflow may be disturbed), and that all files are successfully ingested into NGAS and the associated databases. If it denies execution because it does not find a finished flag, you can override this with the option -O.

PHOENIX (IDP mode). In PHOENIX mode, disk space management becomes even more important. The tool replaces, per date, all phase3-archived FITS files (IDP or associated file) in $DFO_SCI_DIR/$DATE/conv with their header file (.hdr). The graphical files are replaced by symlinks to their counterpart in $DFO_PLT_DIR/$DATE. Any text file in $DFO_SCI_DIR/$DATE/conv remains unchanged (since their size is no issue). Finally, any FITS file in $DFO_SCI_DIR/$DATE is deleted (no replacement) since for PHOENIX, that directory is not the final storage (this is $DFO_SCI_DIR/$DATE/conv), and the products there are not the final, archived ones (these are under $DFO_SCI_DIR/$DATE/conv). The $DFO_CAL_DIR/$DATE directory is not touched at all since it is managed by the phoenix tool and does not contain information of permanent value. For the MASTER/SLAVE model of PHOENIX, an account-specific plugin CLEANUP_PLUGIN (configured in config.ingestProducts) can be called. This is currently being used for MUSE to transfer the headers from the SLAVE account to the MASTER.

The generation of the HDR files in the PHOENIX case serves no particular purpose other than having an easy reference to the product file names.

PHOENIX, MCALIB mode. For a PHOENIX MCALIB project, the tool behaves exactly like in the DFOS case. The applicable rc file .dfosrc_X must contain the key 'export THIS_IS_MCAL=YES', in addition to the key 'export THIS_IS_PHOENIX=YES'. The exported THIS_IS_MCAL key is recognized by the tool and then interpreted as to run in normal DFOS mode, despite under a PHOENIX account. Find more here.

For a PHOENIX MCALIB project, it is highly recommended to also mark the .dfosrc file, then of course with 'export THIS_IS_MCAL=NO'.

General. The extracted product headers contain the primary array only.

The tool is normally called from the ingestion file JOBS_CLEANUP under $DFO_JOB_FILE which is maintained by ingestProducts (this is the usual operational setup for larger data sets). It can also be called from the command line. Its calls are managed in the ToDo list of dfoMonitor:

ToDo: off-line processing
JOBS_NIGHT
ingest products:
JOBS_INGEST
cleanup (fits->hdr):
JOBS_CLEANUP
view [edit] edit view [edit] edit view [edit] [launch]

If a product file gets listed as hdr file in an executable AB, it will automatically be replaced by processAB with the downloaded fits file.

MCAL_DOWNLOAD. The tool always checks, after the execution per specified date, the file $DFO_MON_DIR/MCAL_DOWNLOAD. This file is filled by processAB with the pathnames of calibration products which are required for processing and had to be downloaded because their parent $DFO_CAL_DIR/<date> directory had already been cleaned up earlier.

EXPLORE mode. This mode helps to detect and cleanup 'hidden' fits files in any historical date directory of:

DFOS: $DFO_CAL_DIR | $DFO_SCI_DIR | $DFO_LOG_DIR | $DFO_PLT_DIR;

PHOENIX: $DFO_SCI_DIR and $DFO_SCI_DIR/.../conv.

In DFOS, these may occur anytime, e.g. because of reprocessing of a historical AB or date. They tend to be forgotten and slowly pile up in your dfos system. For PHOENIX, the EXPLORE mode is important for cleaning up the initial historical batch.

You can call this mode anytime on the command line. For each of the above listed main directory trees it checks for fits files and lists the affected dates:
DFOS:
a) finished dates (entry 'finished' in DFO_STATUS): it calls 'cleanupProducts -d <date>'
b) "unfinished dates" (the ones for which there is no 'finished' entry in DFO_STATUS but other entries do exist): these are kind of unclear. This could happen for actually not yet finished dates, or for historical dates where you did reprocessing. The tool leaves it to you to decide whether or not you want these dates cleaned.
c) remaining dates (no entry at all in DFO_STATUS): these are most likely historical dates (since even thought they were finished properly, at some point their 'finished' flag disappears from DFO_STATUS which holds only a couple of 1000 entries). These dates are offered for cleaning, you can edit the date list and decide case by case to cleanup or not.

PHOENIX:
any date.

For each selected date, the tool is called in the standard mode, thereby you always have the guarantee that only archived fits files are replaced by headers.

All of the directory trees mentioned above are scanned one after the other. If e.g. $DFO_CAL_DIR/2018-01-01 is found to contain a fits file, then also the other three directories are scanned, and a fits file in $DFO_LOG_DIR/2018-01-01 is detected automatically.

This interactive cleaning should be done every now and then, in particular if you see your disk space consumption indicator (on the dfoMonitor) growing slowly.

In PHOENIX mode, cleaning the historical batch might become very involved. In that case, and only then, there is the flag '-n' for the EXPLORE mode to run the tool in automatic mode, without acknowledging every single date ('cleanupProducts -E -n'). You may find it useful after having called the tool for a few dates in the usual, interactive mode ('cleanupProducts -E').

Output

DFOS: HDR files of the product files in $DFO_CAL_DIR/$DATE and $DFO_SCI_DIR/$DATE.

PHOENIX: HDR extractions of the archived product FITS files in $DFO_SCI_DIR/$DATE/conv; $DFO_PLT_DIR is not cleaned, to preserve comfortable browsing of previews with the phoenixMonitor.

Recovery

If any of the cleaned fits files is needed again, it can be recovered using 'ngasClient -c <file_id>' (for mcalibs in DFOS mode) or 'ngasClient -i <idp_id> (for IDPs/associated files in PHOENIX mode).

How to use

Type cleanupProducts -h for on-line help, cleanupProducts -v for the version number, and

cleanupProducts -d 2024-10-12

to replace all products for that date with their headers,

cleanupProducts -E

to explore all directories for left-behind fits files,

cleanupProducts -E -n

(in PHOENIX mode only) to explore all directories in automatic mode.

Use

cleanupProducts -d 2018-10-12 -O

(in DFOS mode) to clean up a date which has already fallen out of DFO_STATUS.

Configuration file

none

Status information and logging

The tool writes the DFO status cal_Cleaned (except for the cases when it is called by itself in the EXPLORE mode).

Installation

The tool comes as part of the utilPack.

Operational aspects

Simplified sequence:

  CALIB: certifyP, moveP SCIENCE processing
finishNight afterwards
ingestProducts possible?
yes yes (PHOENIX) yes yes
cleanupProducts possible?
no no no yes (only after ingestP!)

Last update: April 26, 2021 by rhanusch