Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO
*make printable see also:
    documentation about QC grid as pdf
 


clMonitor | XportJob

Condor® Project University Wisconsin
"Like a vulture circling the desert, Condor scavenges for processing power that would otherwise be lost." Myron Livny, "Mr. Condor"

The QC grid

Note: with the termination of SCIENCE processing most of these considerations have lost their relevance. This information is nevertheless kept, maybe it becomes useful in the future again.

A grid is the combination of servers that can work connected or stand-alone. The QC GRID is the combination of the individual dfo blades and the QC cluster.

In the current implementation, pipeline processing jobs as created on a dfo blade can be exported to the QC cluster, with the products being transfered back to the origin dfo blade. There is no connection between dfo blades. The QC cluster can be used for processing

Setup

The QC cluster consist of 20 identical dual-core blades. With 2 of them reserved for interactive QC work (hawki and vircam) and one reserved for managing condor ("condor_master “), there remain 17x2 = 34 cores for condor execution. This compares to 2 cores available on a dfo blade for condor execution plus all foreground (e.g. certification) or background (e.g. trendPlotter, qc1Parser) jobs for a given instrument.

The QC cluster is currently (2010) saturated by VIRCAM processing and QC reports for roughly 24 hours when new data disks arrive (on Tuesdays). Outside that time window, the cluster is largely "idle“ in the sense that all other dfos workflow steps there execute on one node only (neglecting the HAWKI dfos processing). Therefore the QC cluster can then provide CPU cycles equivalent to up to 17 dfo blades.

The current activity on the QC cluster, plus the submitted queue still to execute, is monitored on the cluster monitor (http://qcweb/CLUSTER/monitor/clMonitor.html) and linked to the local dfoMonitor.

The QC grid is particularly attractive for large, self-contained data sets. Self-contained means having dependencies only on archived calibration data, or on non-archived (virtual) calibrations if created by the same cascade. Prime candidate data sets are:

Not so well suited are:

As an example, a huge X-SHOOTER VM night with more than 1000 SCIENCE ABs executes within a few hours on the QC cluster while a standard dfo blade would be saturated for a whole weekend. Another example are NACO burst mode data.

Installation

All currently operational QC accounts on the dfo blades have been replicated to qc05 on the QC cluster. Each of them can use the QC cluster as a QC grid.

Software support consists of two tools:

Find more under these URLs.