Alerts

Alerts are used to communicate notable or anomalous conditions during Data Acquisition life-cycle. This is provided both to help identify sources of errors and in some cases help prevent problems. Conditions are typically such that a user intervention is necessary.

If it can be identified that the condition that raised the alert resolves itself the alert will automatically be cleared and is removed from the list of active alerts.

If a Data Acquisition has an active alert with severity Error the Data Acquisition error flag will be set.

Note

Some alert conditions can be inhibited in order to prevent false positives. See Alerted Conditions for details.

Data Model

The status of a Data Acquisition contains list of active alerts associated with it, where each alert contains:

  • An identifier unique for the Data Acquisition. This identifier is unique for each condition and may be used for correlation over time. To recognize that an specific alert was cleared compare it against all the alert identifiers in the alert list. If no match is found it was cleared.

  • A severity level indicating how sever the identified condition is. See below for more details.

  • A description with information about why the alert was set.

  • Timestamp when alert was set or updated.

Three severity levels are used, from most to less severe:

Error

Error has occurred that is expected to require intervention to continue Data Acquisition.

Warning

Alerts a condition that is suspicious but it is not known to be an error at the time of the alert. It could cause an error in due time.

Info

Alerts a condition or event that has no unexpected functional impact but is nevertheless relevant information that can be useful for e.g. troubleshooting or tuning.

See also

Alerts are accessed via the MAL interface

daqif.DaqStatus.alerts

Contains the list of active alerts for a Data Acquisition.

daqif.DaqAlert

Describes an alert.

daqif.AlertSeverity

Describes severity levels.

Alerted Conditions

A summary of identified conditions that cause an alert to be set is provided in Table 5 with full description in sections below.

Table 5 Alert Condition Summary

Condition

Severity

Failed Operation Alert

Error

Data Source Request Error

Error

daqDpmServer Request Error

Error

Source File Collection Error

Error

Source File Releasing Error

Error

Suspicious Source File Timestamp

Warning

Unmerged Data

Warning

Primary HDU Resized

Info

Failed Operation Alert

Alert is set if asynchronous operations fail. This alert is typically also paired with more specific alert that was the cause of the failure, such as Data Source Request Error.

Property

Description

Severity

Error

Cleared

Alert is cleared if subsequent attempt succeeds. New attempt can typically be made by issuing corresponding request to daqOcmServer which will then try again.

Inhibit

Cannot be inhibited

Data Source Request Error

Alert set when request to data source fails. Description contains information which data source and the request that failed.

Property

Description

Severity

Error

Cleared

Alert is cleared if subsequent attempt succeeds. New attempt can typically be made by issuing corresponding request to daqOcmServer which will then try again.

Inhibit

Cannot be inhibited

daqDpmServer Request Error

Alert set when request to daqDpmServer fails, unless it is a timeout error. Timeout is not considered an error as it may have legitimate reason for being offline.

Property

Description

Severity

Error

Cleared

Alert is cleared if subsequent attempt succeeds.

Inhibit

Cannot be inhibited

Source File Collection Error

Alert set if daqDpmServer fails to transfer a source file to the local workspace. This could be because of misconfigured SSH. See daqDpmServer deployment.

Property

Description

Severity

Error

Cleared

Alert is cleared if subsequent attempt succeeds.

Inhibit

Cannot be inhibited

Source File Releasing Error

Alert set if daqDpmServer fails to transfer final Data Product to a receiver. This could be because of misconfigured SSH. See daqDpmServer deployment.

Property

Description

Severity

Error

Cleared

Alert is cleared if subsequent attempt succeeds.

Inhibit

Cannot be inhibited

Suspicious Source File Timestamp

Alert set if daqDpmServer detects that a source file has been modified after Data Acquisition was stopped. This is an indication that

  • a data source did not create the source file before indicating that it stopped or

  • something modified the source file afterwards.

In either case this can lead to data corruption as daqDpmServer may start collecting the file while it is being modified.

Property

Description

Severity

Warning

Cleared

Never

Inhibit

Cannot be inhibited

Unmerged Data

Alert set if daqDpmServer detects during the merge process that some data is not merged. This happens if there is data in the FITS Primary HDU but it is not the merge target. See Data Product Creation for details.

Property

Description

Severity

Warning

Cleared

Never

Inhibit

To prevent the alert being set, because the data is expected to be there and not be merged, set the attribute alertUnmergable to false for the relevant data source.

See also

See documentation on alertUnmergable for data sources:

Primary HDU Resized

This is a performance degradation alert that is set if the FITS Primary HDU had to be resized in order to make space for all FITS keywords being merged into it.

The data source creating the FITS file should be configured with additional space for keywords. Alert description provide information on how much more space is needed.

Property

Description

Severity

Info

Cleared

Never

Inhibit

Cannot be inhibited (prevent condition by configuring data source properly)