Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO
*make printable new: see also:
 

v5.1:
- no SCIENCE steps offered anymore (DFOS_OPS); config keys PHOENIX_ENABLED and PHOENIX_ACCOUNT disabled

The tool is managing the daily workflow. Check out under 'Operations' (e.g. here) for information about that workflow.

The tool is also supporting the PHOENIX and OPSHUB environments.

 





[ used databases ] databases sara.. transfer_requests (transfer status), observations..data_products; ngas..ngas_files (NGAS access)
[ used dfos tools ] dfos tools qcdate, ngasClient; checks output from autoDaily, createReport, ingestProducts
[ output used by ] output $DFO_MON_DIR/dfoMonitor.html; tool called by autoDaily | phoenix | distillery
[ output used by ] upload/download upload: XDM / ngasWatcher / transferWatcher files to WISQ; HTML output to qcweb

topics:
description:
general:
technical:

overview | PHOENIX | OPSHUB
installation checks | XDM | AB number | data transfer     checkboxes: autoDaily | HC monitor | calChecker | ToDo lists | last ABs | DATE table    links: service user links
output pages | how to use | configuration | status | operational aspects
decision making | Workflows and tool flavours

dfoMonitor

[ top ] Overview

enabled for parallel execution
OPS
HUB
enabled for OPSHUB workflow

This tool provides the central interface for monitoring and managing the QC daily workflow (called DFOS_OPS), the PHOENIX workflow, and the OPSHUB workflow.

It serves as the standard interface to the workflows, offering all needed functionalities and interactivity. For the daily workflow, it scans the active DFO dates, reads and displays their process status, and offers the next workflow steps. For the PHOENIX workflow, it connects to the currently open dates, or pseudo-dates in the DEEP mode. For the OPSHUB workflow, it connects to the currently open PROJECTS.

For PHOENIX and OPSHUB workflows, the dfoMonitor is mainly a passive monitor. For the QC daily workflow, it has active buttons.

This documentation has some parts shaded in grey (if applicable to DFOS_OPS and/or PHOENIX only), or in light blue (if applicable to OPSHUB only).


[ top ] PHOENIX workflow

PHO
ENIX

phoenix is the workflow tool for automatic science processing. It is used by the IDP accounts on muc08+. Find more information here.

The dfoMonitor in PHOENIX environment supports the CALIB and SCIENCE jobs as required for the PHOENIX project, either organized in DATEs, or in PSEUDO-DATEs (for DEEP projects).

Find more information on the phoenix page.


[ top ] OPSHUB workflow

OPS
HUB
For the OPSHUB, distillery is the workflow tool. It is supported by the dfoMonitor. Find more information here.

The OPSHUB version of the dfoMonitor has no active buttons. It is organized by PROJECTs and displays all corresponding dates. Depending on the PROJECT configuration, it might have CALIB and SCIENCE jobs at the same time. The dates have two possible states: ABs and processing jobs created; processing jobs executing or executed. It has the XDM monitor as well as an icon bar and the Ganglia system monitors.


Overview of the fields of the dfoMonitor

[ top ] Installation checks. The tool monitors the DRS_TYPE (as configured in config.createAB): condor on (CON) or off (anything else). The configured $DFS_RELEASE is displayed. If configured, $MIDAS_CHECK compares the default MIDAS version to $MIDVERS. Finally, the currently enabled pipeline version is displayed (with the detmon pipeline filtered out).

[ top ] Monitoring of AB counts. If the number of ABs in $DFO_AB_DIR is beyond a certain limit, the AB monitor (getStatusAB) becomes slow, and this will also slow down autoDaily. To become aware of potential issues, the total number of ABs in $DFO_AB_DIR is monitored. It scores red if a hard-coded threshold is hit. Currently this threshold is 2500.

N_ABs:
530

[ top ] Disk space, XDM. The data disk space is monitored since with the data disk full, no automatic processing is possible. A quick overview is provided:

data disk: 120.5 GB (30%)

dfos_ops: It updates in the background (ash mechanism) if clicked.

dfos_ops and phoenix only: The XDM (eXtended Disk space Monitor) provides detailed feedback about the disk space usage on the data disk. It monitors the following data disk directories:

disk space on $DATA_DISK (total: 870 GB)
RAW: $DFO_RAW_DIR updated each time dfoMonitor is called
CAL: $DFO_CAL_DIR
SCI: $DFO_SCI_DIR
DFS: $DFS_PRODUCT
LST: $DFO_LST_DIR
*HDR: $DFO_HDR_DIR these values are normally read from the DFO_STATUS file and therefore static!
They are also updated eventually when they get removed from DFO_STATUS (if 5000 new entries make them outdated so that they are auto-removed).
dfos_ops, PHOENIX: They are updated on demand, using [refresh], which will take a couple of seconds.
*PLT: $DFO_PLT_DIR
*LOG: $DFO_LOG_DIR
SUM: sum of all above  
OTH: all other data on $DATA_DISK in non-standard folders  
FREE: remaining free remaining free disk space

Disk space used by the directories is listed in GB; the bar indicates usage in percentage. The disk space score turns red if more than 80% disk volume is occupied.

If a quota is defined in the config file (DATA_QUOTA), it is indicated and taken into account.

dfos_ops only:

The XDM is exported to http://www.eso.org/observing/dfo/quality/WISQ/XDM/XDM.html and linked to the WISQ monitor on the navigation bar.

OPSHUB only:

The 'du' command used to retrieve the disk usage works extremely slow on the opshub gluster filesystem. Therefore, the XDM is disabled on the OPSHUB.


"CAL" row

dfos_ops only:

The row labelled "CAL" gives an overview of the current last N dates for the autoDaily workflow. It also provides links to launch interactively the tools productExplorer and refreshVCAL.

The PHOENIX version has only the link to the tool productExplorer.

OPSHUB only:
"Select other instruments" is a jump menue to the dfoMonitors of the other configured instruments.

[ top ] Data transfer links

dfos_ops only:

This checkbox has links related to the data transfer system (DTS), plus two rows for status checks of NGAS access ("ngas") and of the health of the transfer process ("transfer"), plus two buttons to launch queries. The ngas status is checked each time the dfoMonitor tool is launched, by launching an ngas download with ngasClient (the file is hard-coded as $TEST_FILE). If an error occurs, its code is displayed. As a timeout mechanism, the monitor waits for 60 sec at maximum for ngasClient, and then aborts. The DTS test and the ngas download are done in the background, and the result from the previous execution is displayed. This is usually good enough since dfoMonitor is called by many different tools and therefore usually sufficiently up-to-date. The background call is done because of performance issues.

"Transfer" is checked with a query to the sara database which hosts file names and transfer status values. All CALIB files with transfer status < 6 (meaning not yet in the primary archive) are found, if the delay is more than 1 hr and less than 72 hrs. The one with the longest delay is displayed. If none is found, the "transfer" status is ok, otherwise nok. There is also an indication for delays of files of any type, but this is not used for the nok alert. This is motivated by the fact that for incremental processing, and for the closure of the QC loop with Paranal, CALIB files are by far the most important files. To avoid false alerts, delays by less than 1 hour are not evaluated. Delays by more than 72 hours are disregarded either since it is assumed that these might be due to database inconsistencies. This is not always true but the tool cannot decide this.

The complete query result is displayed upon launching the red action button (line labelled as "longest delay"). The green action button launches the inverse query, all archived files with status 6 and their delay values (time between OLAS archiving on Paranal and in the primary archive in Garching).

The DTS/Evalso monitors are displayed in the bottom monitor panel called "system", right. There are currently monitors for the two PAR-VIT links (#1 | #2), and one monitor for the connection VIT-GAR.

Data
Transfer:
  Monitors: DataTransfer  
ngas
transfer
no CALIB file delayed by >1 hr

In case of problems, flags will turn red, e.g.:

Data
Transfer:
  Monitors: DataTransfer  
ngas
transfer
longest CALIB delay: VISIR.2018-11-08T08:13:11.123.fits CALIB (2.5 hrs)
longest delay (any dpr.catg): VISIR.2018-11-01T08:13:11.123.fits SCIENCE (54.4 hrs)

The ngas and the transfer flags are exported to the web server and embedded in the calChecker and the HC monitor.


Cronjobs check boxes

The operational cronjobs are monitored here (applicable to dfos_ops only).

dfos_ops only:

[ top ] autoDaily checkbox. This checkbox is intended to make the current status of the processing scheme more transparent. It checks for:

  • "enabled as cronjob ": autoDaily being enabled as cronjob (with the pattern displayed)
  • "cleanupRD enabled": once a day the utility tool cleanupRawdisp should be called to maintain the RAWDISP pages both on the web server and on the replication server;
  • "dfosCron monitoring enabled": for the 'forced refresh' button on the HC monitor, the monitoring tool call 'dfosCron -t autoDaily' must be enabled as cronjob, with a cadence once per minute (because of the promise that 'autoDaily' will start within a minute after request); the checkbox checks for the entry in the cronjob file and also for the proper cadence (* * * * *).
autoDaily?
enabled as cronjob
cleanupRD enabled
dfosCron monitoring enabled

This box must be green for dfos installations. The configured cronjob pattern is visible when hovering the mouse.

The activities of autoDaily are displayed in real-time underneath the XDM. If autoDaily is not running, this box displays:

autoDaily: no dates

If there is autoDaily activity, messages will inform about progress. You can follow the workflow by clicking on the 'log' link:

autoDaily: list_data_dates
log autoDaily running!
calling createAB

[ top ] HC monitor checkbox. This checkbox monitors the proper update pattern of HC reports. It checks for the existence and proper scheduling of the following jobs:

  • JOBS_TREND: the complete set of HC jobs (linked to the job) to be updated at least once per day (with the coded cronjob pattern displayed on mouse over)
  • JOBS_HEALTH: the essential set of HC jobs used for incremental HC monitor updates by autoDaily (linked to the job)
  • JOBS_NAVBAR: the job file for regular updates of the HISTORY navigation bars (to be called twice a month)
HC monitor updates?
JOBS_TREND enabled as cronjob
JOBS_HEALTH existing
JOBS_NAVBAR enabled as cronjob

[ top ] calChecker checkbox. The first checkbox checks for the existence and the proper scheduling of the calChecker cronjob (to be called every half hour). The second one checks if once a day the FULL mode is called, as a safety mechanism.

calChecker?
enabled as cronjob
FULL: enabled as cronjob

[ top ] AB checkboxes. These checkboxes are used to monitor the autoDaily execution. The following information is displayed:

  • the last created AB (its name is written by createAB into $DFO_MON_DIR/LAST_AB) and its age
  • the last processed AB
  • the last autoDaily execution and its age.
Last created AB: GIRAF.2012-09-13T13:07:07.895_tpl.ab (age: 20.4 h)
  Last processed AB: GIRAF.2012-09-13T13:07:07.895_tpl.ab
Last autoDaily: 2012-09-14T09:30:45 (age: 0.0 h)

The last autoDaily execution is written into the file $DFO_MON_DIR/autoDWatcher.html and exported to the HC web site. It is included there in the monitor page http://www.eso.org/observing/dfo/quality/ALL/qc1_info.html, ready to be inspected by the QC shiftleader. It will automatically flag red if its age excesses 6 hours.


[ top ] ToDo lists

The non-automatic tasks (e.g. ingestion) are listed here:

dfos_ops, PHOENIX:
The tool manages the following off-line jobs (all under $DFO_JOB_DIR):

  • JOBS_NIGHT (filled by createJob)
  • JOBS_INGEST (filled by moveProducts)
  • JOBS_CLEANUP (filled by ingestProducts)

Managing means: check if the file contains valid entries; offer links to watch, edit, and execute. The open tasks appear under 'ToDo', either in grey (nothing to do) or in yellow (something to do):

ToDo: off-line processing
JOBS_NIGHT
ingest products:
JOBS_INGEST
cleanup (fits->hdr):
JOBS_CLEANUP
watch [edit] [launch] watch [edit] [launch] watch [edit] [launch]

The tool also offers links to some log subdirectories ($DFO_MON_DIR/AUTO_DAILY and CRON_LOGS) and to the DFO_STATUS file, with the status flags.

POSTIT. There is the option to post notes, reminders etc. of temporary character into a text file and include them in the monitor (POSTIT function). Just click on the 'edit' link and create or edit the file $DFO_MON_DIR/DFO_POSTIT. The text will display in the dfoMonitor after refreshing.

Notes: [edit] 
2017-01-28: 4 files Medusa2 instead of Argus; hide!
2017-01-27: some STD ~ok for fibre_effic correction, see how 2017-01-28 data look like.

[ top ] Service links. They come in the blue row between the header part and the date result part:

dfos_ops only:

  • navigation bar maintenance (links to the config file and a refresh button for the dfos monitor navigation)

  • launch the dialog for rawdisp2reference (the utility tool for defining a RAWDISP set as reference, see here)
  • a link to start the dfoManager (more...).

For editing the monitor navigation bar, a link is offered to the corresponding configuration file (config.gui_navbar) which can be edited in the same way as the tool configuration file config.dfoMonitor. The monitor navigation bar is included in all monitors for the daily workflow.

NOTE: to update the navigation bar, first edit the config.gui_navbar file, then call dfoMonitor. All other monitors will then show the updated navigation bar after execution of the respective tool.


[ top ] DATE table

The main table is organized by DATEs. Depending on the workflow environment, there are different columns.

Links are offered to related information:

dfos_ops only:

There is a link to the daily calChecker result pages ("CAL"). They are permanently stored under $DFO_LST_DIR/CALCHECK.

The link 'status' is an extraction from DFO_STATUS for the corresponding date, intended as an overview of the current processing status.

The tool displays filtered files, as detected by filterRaw. If an entry exists in $DFO_LST_DIR/filt_<instr>_<date>.txt, the corresponding box is colored yellow, and a link to the list is offered.


dfos_ops and PHOENIX:

The tool displays whether the night had SM or VM (or both) SCIENCE runs. This information is extracted from the data reports.


[ top ] Icon bar, user links.

Below the main table, there are a few more rows.

dfos_ops and PHOENIX:
There is a tool bar with frequently used links. They can be defined by the user in the configuration file.

There is also an icon bar with (hard-coded) standard links:

  • the nautilus (histoMonitor): link to processing history;
  • the SciOps DYNAMIT tool and web page,
  • the scoreManager,
  • the dfosExplorer,
  • the statistics system WISQ (with specific links for your instrument)
  • links related to system monitoring: the GANGLIA monitor (see below), the MUNIN performance trend system and the condor system configuration. GANGLIA, MUNIN and condor have tutorial links to Wikipedia pages.

OPSHUB:
The user links are hard-coded.

There is also an icon bar with (hard-coded) standard links:

  • the OCA rules;
  • the OPSHUB storage for the current instrument;
  • the nautilus (histoMonitor): link to the QC processing history;
  • the dfosExplorer (tool maintenance);
  • the SciOps DYNAMIT tool and web page;
  • links related to system monitoring: the GANGLIA monitor (see below) and the condor system configuration. Both have tutorial links to Wikipedia pages.

 

[ top ] System links (dfos_ops and PHOENIX): The monitor page displays the 4 GANGLIA performance reports for your host:

performance load_report cpu_report mem_report network_report
example

 

dfos_ops, PHOENIX:

Use the H D w m links for easy switching between hour|day|week|month timescales for the Ganglia reports.

OPSHUB:

The reports are hard-coded as hourly reports.

The server name is read via unix 'hostname'. These reports are produced by SOS under the main URL http://mucmp.hq.eso.org/ganglia/.

For more information about GANGLIA check out the help link on the dfoMonitor in the system monitor "GANGLIA" box.


[ top ] Output

HTML output. The result HTML page is stored locally under $DFO_MON_DIR/dfoMonitor.html.

dfos_ops only:

It is copied, with stripped-off functionalities, to the QC web server (http://qcweb.hq.eso.org/~qc/<instr>/monitor).

The extended disk space monitor XDM is exported as a separate page. To have it included in the WISQ information system, it goes to the overview page http://www.eso.org/observing/dfo/quality/WISQ/XDM/XDM.html.

ngasWatcher.html and transferWatcher.html are exported to the QC web server (to /qc/<instr>/reports) to be included in calChecker and HC monitor.

dfoMonitor is enabled for autoDaily, the wrapper tool for automatic processing the initial part of the daily workflow. The status table on the top right part of the monitor page displays whether an autoDaily is currently executing, monitors the execution status and offers a link to the execution log.

The tool has some additional options (-a, -m, -q) which are not required for command-line usage but have been introduced for autoDaily.

The tool displays the ingestion status of calibration products (under 'cdb'). The column for science products ('sci') is not filled for dfos_ops. This is useful to get a reminder about data sets not yet ingested, since ingestProducts is called off-line. The tool checks for files list_ingest_CALIB_$DATE.txt in $DFO_LST_DIR.

To support incremental processing, the tool offers a special blue button for preliminary certification of TODAY's CALIB data. There you can provide feedback to SciOps (comments about ABs, certification flags). The workflow calls certifyProducts -L ("certifyP-light"). No data are moved, the AB monitor is updated and exported. See more on the certifyProducts page.

It is possible to directly edit the configured values for the calibration memory depth, N_MCAL_LIST and N_VCAL_LIST, in the top 'CAL' section. You can also call the utility tools refreshVCAL and productExplorer there.


dfos_ops, PHOENIX:

The tool uses the standard '.esh' and '.ash' mechanism to make the browser interactive. Find a description how to implement this here. The '.ash' functionality is used to interactively update the load or disk status in the background.


[ top ] How to use

Type dfoMonitor -h for on-line help (there is extended help available from the html page), and dfoMonitor -v for the version number.

dfos_ops, PHOENIX:
Type

dfoMonitor

to create or refresh the dfoMonitor.html page.

OPSHUB:
Type

dfoMonitor -i <instr>

to create or refresh the dfoMonitor.html page for that instrument.

There are also hidden options -a (switch off check for autoDaily running); -m (to display the status message for autoDaily); -q (quiet mode, no logging). These are used by autoDaily.

The option -N is available for execution without ngas checking, on the command line.


[ top ] Configuration file

dfos_ops, PHOENIX:
The tool reads its own config file plus some others. config.dfoMonitor defines:
Section 1: general parameters
XTERM_GEOM e.g. 120x25+10+500 size and location of pop-up xterm (used by .esh functionality)
     
IMG_URL, DFOS_URL   URLs for the images and for DFOS documentation
N_DFO_HDR 20 number of latest directories $DFO_HDR_DIR/ to scan (has impact on performance!)
DATA_DISK /data23/giraffe name of data disk hosting the data
DATA_QUOTA 50 quota on $DATA_DISK in percent (optional, default: 100)
CREATEAB_VCAL NO call createAB -m SCIENCE with flag -N, optional (default: NO)
MIDAS_CHECK YES YES|NO: if YES, displays actual versus default MIDAS version (optional, default: YES)
 
GANGLIA_FREQ HOUR | DAY | WEEK | MONTH time range of monitors (default: HOUR); managed by tool
CONDOR configuration:
CONDOR_CONFIG /home/condor/condor_config pathname of (global or special) condor config file (default: /home/condor/condor_config)
dfo: condor_config; qc cluster: condor_config.QC
muc: /etc/condor/condor_config

Section 2: URLs for QC1 and trending
These URLs show up under the 'QC' section of the dfo Monitor. You can have two lines of QC related links. The first one is recommended for the current trending plots. The second one can be anything which you think is useful. You can use the following reserved strings to customize your trending bar:

  • SINGLE_VBAR: |
  • DOUBLE_VBAR: ||
  • BREAK: <br>
  • SPACE: &nbsp;

2.1: First line
Syntax:

  • ITEMxx: should be unique
  • LABEL: can be any string w/o blanks, will mark the link (use underscore for blanks)
  • DISPLAY: string to be displayed on 'onMouseOver' java condition (string w/o blanks; # use underscore for blanks)
  • URL: complete URL to trending page
QC1_URL ITEM01 | bias | bias_trending | http://www.eso.org/qc/GIRAFFE/img/CURRENT/trend_bias_current.gif there can be multiple entries
2.2 Second line
other trending links, including links to the QC1_plotter
QC2_URL ITEM01 | HealthCheck | HealthCheck_Monitor | http://www.eso.org/qc/ALL/daily_qc1.html there can be multiple entries

OPSHUB:
The tool configuration file is created and managed by the distillery workflow tool. Don't touch!

 

[ top ] Status information

dfoMonitor reads status file information. The disk occupancies for "HDR", "PLT" and "LOG" are written into DFO_STATUS.

[ top ] Operational aspects

dfos_ops, PHOENIX:
  • You will typically work, per DFO date, from left to right. A new date shows up as soon as data have been discovered by the tool. An old date is removed by 'finishNight'.
  • The tool manages the helper scripts (.esh files under $DFO_GUI_DIR), meaning it creates and deletes them if necessary. Many of these scripts have calls of dfoMonitor so that the tool will usually self-update, and the user does not need to care about this.
  • The navigation bar is maintained by the gui option "navigation bar: refresh".

[ top ] Decision making

dfos_ops, PHOENIX:
Find here a description of how dfoMonitor decides about the DFO status of a specific DATE. For each main step of the workflow, three fundamental states can be defined:

  • WAIT: meaning the step has not yet been executed, and is not yet offered
  • OFFER: meaning the step has not yet been executed, but it is offered now
  • DONE: meaning the step has been executed, and it is not offered anymore.

The WAIT and DONE status per workflow step is based on finding the corresponding status flag in DFO_STATUS (no matter when the step was executed).

The OFFER status is based on the last entry per DATE in DFO_STATUS.

Usually these three values will be reached sequentially. But there are some cases where the OFFER state is kept although it has already been executed. This applies to the createAB option which is offered as long as the certifyProducts/moveProducts step has not been finished. The reason for this is that you may want to re-execute all or selected ABs when you discover an error or a bad product.

The monitor has three colours to code these states: WAIT is coded grey, OFFER is coded yellow, DONE is coded green. As a special case, the raw_Incomplete status is coded red.

environment workflow step OFFER DONE  
    condition(s) to offer action action offered condition action
dfos_ops entry for DATE

general conditions for entry:

  • $DFO_RAW_DIR/<DATE> exists
  • fits_Requested is set
  • removed is not set
  • DATE is in latest $N_DFO_HDR dates on local disk

current date always labelled as "today"

none specific, depends on status flags    
PHOENIX entry for DATE or pseudo-DATE all <DATES> with JOBS_PHOENIX jobs in $DFO_JOB_DIR    
OPSHUB entry for PROJECT_DATE all <DATES> with execAB jobs in $DFO_JOB_DIR    
all complete? green if raw_Complete set, otherwise yellow; current date: always yellow
     
all VCAL/MCAL condition for entry: CALIB products for DATE in $DFO_CAL_DIR/MCAL and VCAL, resp.   blue if DATE is contained in MCAL/VCAL; check also the select list on top  
 
dfos_ops, OPSHUB createAB (CALIB) last status entry: raw_Complete or cal_AB or cal_Queued or cal_QC launch 'createAB -m CALIB' cal_AB set  
PHOENIX none (unless MCALIB project)        
dfos_ops, OPSHUB CALIB ABs last status entry: cal_AB or cal_Queued or cal_QC link to AB status page, number of ABs n/a (DONE state not offered)  
dfos_ops, OPSHUB certifyProducts and moveProducts (CALIB) last status entry: cal_QC launch 'certifyProducts -m CALIB' plus 'moveProducts -m CALIB' cal_Certif set  
dfos_ops certifyProducts -L last status entry: cal_QC and DATE=$TODAY launch 'certifyProducts -m CALIB -L' plus update getStatusAB no flag set; no DONE since provisional  
 
PHOENIX, OPSHUB SCIENCE ABs last status entry: sci_AB link to AB status page, number of ABs; if N_AB = 0, 'finishNight' offered n/a (DONE state not offered)  
 
all finish last status entry: cal_Updated or sci_Updated or (sci_AB and N_AB = 0) launch 'finishNight' finished set offer the 'remove from dfoMonitor' option under DATE
OPSHUB:
For a given entry (PROJECT, DATE) two states can exist:

  • WAIT for processing: ABs exist (either they have been downloaded or created), processing jobs exist; marked in yellow
  • DONE: ABs have been executed, products are waiting for inspection, a decision is to be taken (-X or -M); marked in green.

[ top ] Workflows and tool flavours

In the DFOS_OPS environment the tool displays the DFOS logo.

In the PHOENIX environment the tool displays the PHOENIX logo.

In the OPSHUB environment the tool displays the OPSHUB logo.


Last update: April 26, 2021 by rhanusch