PHASE 3 RELEASE VALIDATOR NAME validator.jar - validates a Phase 3 data release before uploading to ESO. SYNOPSIS java -jar validator.jar [-h] to get help on the available options. OR java -jar validator.jar -v print the version and exit. OR java [-Xmx1024m] -jar validator.jar -r release_dir -m modification_type \ [-f fitsverify_utility] [-t number_of_threads] to run a validation of the data contained in release_dir for a release of type modification_type [using fitsverify_utility instead of the default fitsverify to check the compliance of fits files to the standard] [running a multi-threaded validation]. The java option -Xmx should be used if the default setting turns out to be not enough and leads to an out of memory error. OPTIONS --conf -c filename Default value: none. Specify a configuration file from where to read the command line options. The file is parsed as a properties file, i.e. each line is either a comment, starting with a hash character (#), or in the format: = - If is a valid command line option, then its value is set to - If is not a valid command line option then the line is ignored. - If is specified both in the configuration file and on command line, the value on command line overrides the value in the configuration file. - can be either the long name or (when it exists) the short name of a command line option, always without the leading dash(es). The following two examples specify the same configuration: Example1) Configuration value from command line: --fitsverify/home/user/bin/fitsverify Example2) Configuration value from configuration file: fitsverify=/home/user/bin/fitsverify --fitsverify -f fitsverify_utility Default value: fitsverify Specifies which command line utility to use for fits format validation. If this option is not given, the default value "fitsverify" (quotes for clarity) for the fits verification utility will be used. The validator tries to locate an executable called first as pathname, then on the same directory where validator.jar is located and then on any directory of the executable path. If is not found an error is reported but the validation continues (due to the missing utility the result will be in any case an error, but in this way other potential errors are reported as well). --modification-type -m modification_type This option is mandatory and does not have a default value. Specify the modification type of the release. The modification type can be either CREATE or UPDATE (both case insensitive). In case of an UPDATE release, a file named "CONTENT.ESO" (quote for clarity) must be present in the release's directory. This file contains in an internal format the information currently stored at ESO for the release being updated. This file must be downloaded from ESO and must not be edited thereafter. A file "CHANGES.USER" (quote for clarity) can optionally be present as well. This file is in text format, it is written by the user, and it contains the required update actions for the release. It is composed by lines in the format: {DELETE|REPLACE} filename --help -h Print usage help and exit. --reldir -r releasedir This option is mandatory and does not have a default value. Specifies the base directory of the release to validate. This directory and all its sub-directories must be readable and accessible. --threads -t number Default value: 1. Accepted range for number: 0 or greater. Set the number of threads of execution for the validation of files. The default value (1) means single-threaded execution. The special value 0 means that the number of validation threads is dynamically set to the number of processors available to the java virtual machine. File validation includes the steps: Metadata validation, fits compliance validation and md5sum check. Note that for these steps the limiting factor for performance is usually the disk I/O and not the cpu usage. Therefore, unless a single-threaded execution does not fully exploit the I/O capability of the local disk, the increment in performance with a multi-threaded execution might be marginal. NOTE: activating multi-threaded validation may cause the tool to hang in case of big catalogue files. This is a known issue and is currently being investigated. --version -v Print the version and exit. --verbose -V Run in verbose mode. This increases only the number of messages in the log file, it does not effect the validator output on terminal. DESCRIPTION The validator is a command line tool used to check if a phase 3 data release is compliant with the ESO constraints and requirements before the release is uploaded to ESO. The validator provides a pass/fail result, in case of failure the list of detected errors is written on file. There are 4 main steps in the validation process: 1. Validation of the release structure. The release structure is extracted from the metadata available in the headers of the release's fits files. The fits header keywords parsed to reconstruct the release structure are: PRODCATG - category of the (science) fits file. The associated value must start with "science." (quotes for clarity). CHECKSUM - checksum value (one keyword-value pair for each fits header). ASSONn - name of the n-th component of the dataset. ASSOCn - category of the n-th component of the dataset. ASSOMn - md5sum on the n-th component of the dataset (optional). The provenance information can be conveyed in two ways: 1. PROVn - name or ID (in the ESO archive) of the n-th component of the provenance. 2. Using a dedicated binary table extension. Please refer to the "ESO Science Data Products Standard" document for more details about the format of the table and how to enable this feature Invalid categories, duplications and inconsistencies in the definition of datasets or provenance informations are reported as errors. 2. Fits validation. For this step an external utility (by default the NASA HEASARC fitsverify) is used. The external utility must be present as executable on the system where the validation is performed. If the utility is not present and an alternate option is not specified the release will be considered in error. 3. Metadata validation. This step checks the consistency of the metadata against the rules specified by ESO. The rules for this check are dynamically updated before each check, therefore an HTTP connection with ESO is required during the validation. 4. Catalogue validation. If a release contains a catalogue additional checks are performed by the validator on the catalogue metadata and data. METADATA: The following format requirements for catalogues are always checked independently of the checks performed in step 3: - the main header can not contain data - the extension containing the catalogue (identified by EXTNAME=PHASE3CATALOG) must be a BINTABLE - the following keywords must be present in all catalogue files: DATE, ORIGIN. - the following keywords have to be present in single file catalogues and in the main file of multitile catalogues: REFERENC, MJD-OBS, MJD-END, TELESCOPE, INSTRUME, FILTER. - the following checks are performed on column specific keywords: TCOMMi: - must be defined for every i and should not be empty TFORMi: - must be defined for every i - must be in the rTa format, where type T must be in (L,X,A,B,I,J,K,E,D) TSCALi: - can not be defined TTYPEi: - must be defined for every i - must be unique - can not be one of the reserved SQL keywords defined in the file sql_reserved_words.txt (the match is case insensitive) - must match the regular expression [A-Z][A-Z_0-9]* TUCDi: - there must be only one identifier column (i.e. TUCDi=meta.id;meta.main) - all ucd atoms must be defined in the UCD1+ list v1.23 from April 2007 TXLNKi: - must be either CATALOG, ARCFILE or ORIGFILE (the definition of this keyword implies that the keywords TXCTYi, TXP3Ci and TXP3Ri be consistent to define the proper association. Please refer to the phase3 documentation for more detailed information TZEROi: - can not be defined DATA: - the identifier column can not contain null values - values must be within TDMINi and TDMAXi (applies only to numeric values) In case of multi-file catalogues (using tile-by-tile submission) the tool also checks that every column specific keyword mentioned above has the same value in all catalogue files: if an optional keyword is defined only in some files the tool does not complain, but it takes the main file as authoritative. In case of an update release (instead of a newly created release) the release structure quoted in step 1. is given by the merging of these three elements: (1) The metadata available in the headers of the local fits files. (2) The structure stored remotely at ESO (contained in the file CONTENT.ESO). (3) The changes to the existing structure specified by the user (file CHANGES.USER). To be precise: (1) is merged with the result of [(2) + (3)] but only if (1) is valid: in case of errors on the release structure parsed from the local files the remote content and its updates are ignored by the validation process as long as the local errors are not fixed. On the other hands if the files CHANGES.USER or CONTENT.ESO are not in the correct format the overall parsed release structure will be incomplete or anyway invalid and therefore the validator might report errors on the local files even if they would be considered valid when validated as part of a newly created release. At startup the validator checks if the input directory exists and is readable, if the fitsverify utility (or the equivalent specified on command line) exists and is executable, and if an HTTP connection to ESO to download the metadata validation rules can be opened. - If the checks on the directory fail, the validator exits immediately with an error. - If fitsverify is not available, the validation goes on but will end with an error. - If a rule file cannot be downloaded, the validation of the files with the corresponding category will be considered in error. OUTPUT The output of the validator (on standard output) is a summary of the validation, which is also reported on file, and a status message (release valid / release in error) which is either of the following messages (quotes for clarity): "OK - RELEASE CAN BE UPLOADED TO ESO." "ERROR - PLEASE FIX THE ERRORS BEFORE UPLOADING THE RELEASE." Summary example: FILES Science Calib Ancillary Other ASSOCIATIONS Dataset Provenance ERRORS Fits validation Catalog validation MD5 check Missing CHECKSUM/DATASUM Invalid category Missing category Meta-data Outlier Missing from disk Invalid provenance Duplication Inconsistency Other Summary Explanation: FILES: Existing files belonging to the release (i.e. declared in the fits headers and at the same time available, either on disk or remotely). Note: this means that the number FILES might be different from the number of files present locally in the release's directory. Science Calib Ancillary Other The entries above represent a breakdown of FILES per category. ASSOCIATIONS: Associations declared in the headers of the fits files. Dataset Datasets declared in the headers. Provenance Provenance information declared in the headers. ERRORS: Total number of errors arisen while validating the release. Fits validation Fits files for which the used fitsverify utility reported an error. Catalog validation Catalogue files for which the catalogue specific validation reported an error MD5 check Files for which there is a md5sum declared in one fits header but it is different from the one computed on the file itself. Missing CHECKSUM/DATASUM Number of missing CHECKSUM and DATASUM keywords in the headers (each fits file might have multiple headers). Invalid category Files with a category not declared in the ESO list of valid categories. Missing category Files without a defined category. Meta-data Fits files which did not pass the metadata validation. Outlier Files found in the directory structure of the release but not declared in the fits headers. Missing from disk Files declared in the fits headers but not found on disk in the release's directory tree. Invalid provenance Provenance information with missing elements or where a circular definition were detected. Duplication Duplicated definitions in the release structure. Inconsistency Inconsistencies in creating the release structure. It signals: - That remote and local contents in an update release do not fit together, even if the contents themselves are valid stand-alone. - An empty new release or an update release with no updates. - A fits file that cannot be parsed by the release parser (NOTE: the inconsistency error is on top of the potential errors given by metadata rules and fitsverify). Other Errors which do not fit in any of the above entries. Example: if fitsverify utility cannot be run, the error will be added to this entry. EXAMPLES: $> java -jar validator.jar Prints a short usage message. $> java -jar validator.jar -r /data/HARPS -m create Validate the release's directory /data/HARPS using as fitsverify either the one in the current directory or any found on the executable path. The files validator.error, validator.log, validator.toc are written in the current working directory. $> java -jar validator.jar -r /data/HARPS -m create -t 2 Validate the release's directory /data/HARPS as above, but all the validation steps on a given file are executed in a separated thread of execution. With the input value (-t 2), two threads will run in parallel and therefore two files will be validated in parallel. The gain in speed depends on the used hardware, on the specified number of threads and on the validated files themselves. $> cd /tmp; java -jar /data/utilities/validator.jar -r /data/HARPS -m create Validate the release's directory /data/HARPS using as fitsverify the executable /data/utilities/fitsverify if exists, otherwise the first one called fitsverify and found on the executable path. The files validator.error, validator.log, validator.toc are written in the current working directory (/tmp). $> java -jar validator.jar -r /data/HARPS -m create -f /data/utilities/alternate_util Validate the release's directory /data/HARPS using as fitsverify the executable /data/utilities/alternate_util $> java -jar validator.jar -r /data/HARPS-TAKE2 -m update Validate the release's directory /data/HARPS-TAKE2 and merge its content with the content declared in /data/HARPS-TAKE2/CONTENT.ESO and modified according to the instruction in /data/HARPS-TAKE2/CHANGES.USER The resulting release is described in the file validator.toc FILES Files generated in output. The validator produces 3 files in the current working directory: 1. validator.error: list of short error messages describing the errors found in validating the release. This file is empty if no errors are found. 2. validator.toc: it contains the status (release ok/in error), a summary of the release (number and type of validate files, number and type of found errors, number of parsed datasets and provenance information) and a dump of the parsed metadata for datasets, provenance, and categories. 3. validator.log: a more detailed list of messages generated during the validation. If the validator cannot create the validator.error and validator.toc files, their content is printed on standard output instead. External fitsverify utility: fitsverify is a command line utility which verifies the compliance of a fits file with the fits standards. It is produced and distributed by NASA HEASARC and can be downloaded from: http://heasarc.gsfc.nasa.gov/docs/software/ftools/fitsverify/ The binary version for Linux is provided together with the validator.jar file. Any alternate utility can be used as long as: - The alternate utility can run from command line. - The alternate utility takes as single input parameter the name of the fits file to check. - The alternate utility returns as exit code 0 if the input is a valid fits file, and a code != 0 otherwise. - If the exit code is not 0, details on the error(s) are sent to the standard output (so that they can be reported in the validator.log file). LOCATION This file (named validator.README), the latest validator version and the fitsverify executable compiled for Linux can be downloaded from: http://www.eso.org/sci/observing/phase3