Alerts
Alerts are used to communicate notable or anomalous conditions during Data Acquisition life-cycle. This is provided both to help identify sources of errors and in some cases help prevent problems. Conditions are typically such that a user intervention is necessary.
If it can be identified that the condition that raised the alert resolves itself the alert will automatically be cleared and is removed from the list of active alerts.
If a Data Acquisition has an active alert with severity Error the Data Acquisition error flag
will be set.
Note
Some alert conditions can be inhibited in order to prevent false positives. See Alerted Conditions for details.
Data Model
The status of a Data Acquisition contains list of active alerts associated with it, where each alert contains:
An identifier unique for the Data Acquisition. This identifier is unique for each condition and may be used for correlation over time. To recognize that an specific alert was cleared compare it against all the alert identifiers in the alert list. If no match is found it was cleared.
A severity level indicating how sever the identified condition is. See below for more details.
A description with information about why the alert was set.
Timestamp when alert was set or updated.
Three severity levels are used, from most to less severe:
- Error
Error has occurred that is expected to require intervention to continue Data Acquisition.
- Warning
Alerts a condition that is suspicious but it is not known to be an error at the time of the alert. It could cause an error in due time.
- Info
Alerts a condition or event that has no unexpected functional impact but is nevertheless relevant information that can be useful for e.g. troubleshooting or tuning.
See also
Alerts are accessed via the MAL interface
daqif.DaqStatus.alerts
Contains the list of active alerts for a Data Acquisition.
daqif.DaqAlert
Describes an alert.
daqif.AlertSeverity
Describes severity levels.
Alerted Conditions
A summary of identified conditions that cause an alert to be set is provided in Table 5 with full description in sections below.
Condition |
Severity |
---|---|
Error |
|
Error |
|
Error |
|
Error |
|
Error |
|
Warning |
|
Warning |
|
Info |
Failed Operation Alert
Alert is set if asynchronous operations fail. This alert is typically also paired with more specific alert that was the cause of the failure, such as Data Source Request Error.
Property |
Description |
---|---|
Severity |
Error |
Cleared |
Alert is cleared if subsequent attempt succeeds. New attempt can typically be made by issuing corresponding request to daqOcmServer which will then try again. |
Inhibit |
Cannot be inhibited |
Data Source Request Error
Alert set when request to data source fails. Description contains information which data source and the request that failed.
Property |
Description |
---|---|
Severity |
Error |
Cleared |
Alert is cleared if subsequent attempt succeeds. New attempt can typically be made by issuing corresponding request to daqOcmServer which will then try again. |
Inhibit |
Cannot be inhibited |
daqDpmServer Request Error
Alert set when request to daqDpmServer fails, unless it is a timeout error. Timeout is not considered an error as it may have legitimate reason for being offline.
Property |
Description |
---|---|
Severity |
Error |
Cleared |
Alert is cleared if subsequent attempt succeeds. |
Inhibit |
Cannot be inhibited |
Source File Collection Error
Alert set if daqDpmServer fails to transfer a source file to the local workspace. This could be because of misconfigured SSH. See daqDpmServer deployment.
Property |
Description |
---|---|
Severity |
Error |
Cleared |
Alert is cleared if subsequent attempt succeeds. |
Inhibit |
Cannot be inhibited |
Source File Releasing Error
Alert set if daqDpmServer fails to transfer final Data Product to a receiver. This could be because of misconfigured SSH. See daqDpmServer deployment.
Property |
Description |
---|---|
Severity |
Error |
Cleared |
Alert is cleared if subsequent attempt succeeds. |
Inhibit |
Cannot be inhibited |
Suspicious Source File Timestamp
Alert set if daqDpmServer detects that a source file has been modified after Data Acquisition was stopped. This is an indication that
a data source did not create the source file before indicating that it stopped or
something modified the source file afterwards.
In either case this can lead to data corruption as daqDpmServer may start collecting the file while it is being modified.
Property |
Description |
---|---|
Severity |
Warning |
Cleared |
Never |
Inhibit |
Cannot be inhibited |
Unmerged Data
Alert set if daqDpmServer detects during the merge process that some data is not merged. This happens if there is data in the FITS Primary HDU but it is not the merge target. See Data Product Creation for details.
Property |
Description |
---|---|
Severity |
Warning |
Cleared |
Never |
Inhibit |
To prevent the alert being set, because the data is expected to be there and not be merged,
set the attribute |
See also
See documentation on alertUnmergable
for data sources:
Primary HDU Resized
This is a performance degradation alert that is set if the FITS Primary HDU had to be resized in order to make space for all FITS keywords being merged into it.
The data source creating the FITS file should be configured with additional space for keywords. Alert description provide information on how much more space is needed.
Property |
Description |
---|---|
Severity |
Info |
Cleared |
Never |
Inhibit |
Cannot be inhibited (prevent condition by configuring data source properly) |