RTC Supervisor¶
The RTC Supervisor is NOT a deployment tool, it’s role is to provide a single entry point to the RTC for state guiding, monitoring, error recovery and population of the run time repo.
It responds to a number of the standard commands defined by Stdif and exports global state in the OLDB.
Introduction¶
The current release of the RTC Supervisor performs the following:
Guiding the state of all SRTC components by forwarding state change requests to them
Evaluating the overall state of the RTC by monitoring the state of all supervised RTC components
Monitoring the liveliness of all RTC components and detecting if any have crashed
Providing a simple interface for recovery when one or more components are in error
And in subsequent releases will do the following:
Monitoring any error events generated by RTC components and generating an overall error indication
Loading the contents of the runtime repository from the persistent repository
Implement a mode switching interface by reloading parts of the runtime repository from the persistent repository
Providing a means of updating the persistent repository with items which have been changed in the runtime repository
To act as a base class to which instrument specific functionality and interfaces an be added
The RTC Supervisor implementation is divided into a library and a server. The server implementing the usable component. In addition there are a number of test programs used as integration tests and an example implementation of some deployment code which allows a set of components defined in a YAML file to be launched.
The library implements the functionality required for communicating with
a set of supervised rtcObjects and sending commands to them in a list
rtcCommandRequest and rtcCommandRequestSeries. The complete
configuration of rtcObjects is managed by the rtcObjectConfig class
which reads its config from the runtime repo.
The RTC Supervisor component is currently based on the rtctkExampleComponent structure, i.e. it provides a business logic which implements the various activities. THIS MAY WELL CHANGE and become a simple server implementing the Stdif commands directly.
Launching the Server¶
Being based on the rtctkExampleComponent (at least for the moment) the command line of the server is:
rtctkRtcSupervisor -h
Options:
-h [ --help ] print help messages
-i [ --cid ] arg component identity
-s [ --sde ] arg service discovery endpoint
e.g.:
rtctkRtcSupervisor -i rtc_sup -s file:///$PREFIX/run/exampleEndToEnd/service_disc.yaml
State Guiding¶
Currently state guiding is performed from the activities defined by the
ExampleComponent, in each activity the AllObjectRequestList() is used to
send a series of commands in series or in parallel to the list of
supervised components.
The RTC Supervisor implements activities for and understands the following commands
Init
Reset
Recover (currently empty)
Enable
Disable
NOTE: That Run and Idle are not in the above list. The baseline currently is that the sequencer code will be responsible for sending the Run commands to the rtc components in the correct order. This may be re-evaluated.
There are configuration flags available for each of these activities indicating if they should be performed in parallel or series on the list of supervised components.
Commanding¶
To send one of the supported commands you can use the rtctkSendCommand script which makes use of the rtctkClient program implementing the Stdif client interface as follows:
rtctkSendCommand rtc_sup Init
rtctkSendCommand rtc_sup Reset
rtctkSendCommand rtc_sup Recover
rtctkSendCommand rtc_sup Enable
rtctkSendCommand rtc_sup Disable
Where rtc_sup is the name which the rtcSupervisor has been passed
with the -i flag. The rtctkSendCommand script will look in an environment
variable $REPO_DIR for the service_disc.yaml with which it will look
up the URIs required.
State Evaluation¶
When the object configuration is built from the runtime repo a list of
publish subscribe URIs is created, one per supervised component. The business
logic creates a StateSubscriber with the list of URIs. The
StateSubscribers callback to be called whenever an event is received and
the rtcObjectConfig::OnStateEventReceived() method is called which sets
the state attribute of the identified rtcObject in the object list and
then evaluates the system believed state/substate and publishes it in
the OLDB.
A typical content of the OLDB when the system is operational and
supervising two components, object1 and object2 would be:
rtc_sup:
global_display_state:
type: RtcString
value: On.Operational.Idle
global_state:
type: RtcString
value: operational
global_substate:
type: RtcString
value: idle
global_error:
type: RtcBool
value: false
global_error_who:
type: RtcString
value: ""
state:
type: RtcString
value: "On.Operational.Idle On.Operational.Update.Idle "
object1:
state:
type: RtcString
value: "On.Operational.Idle On.Operational.Update.Idle "
object2:
state:
type: RtcString
value: "On.Operational.Idle On.Operational.Update.Idle "
Asynchronous Detection of Component Failure¶
Asynchronous monitoring is performed by the rtcMonitor class. The rtcServer
has a member which is an rtcMonitor. A thread is created from the
rtcMonitors creator which periodically when active calls the rtcServers
MonitorCycle() method.
The rtcServer marks the monitor as being active whenever the state is at least NotOperational/Ready.
The rtcServers MonitorCycle() method uses the AllObjectRequestList() to
send a GetVersion command with a short timeout to each component. If the
command fails the rtcObject sending the command will mark the component
as having generated an exception and commands will not be sent to it
subsequently.
If a component does fail then the InError method is called to set the
error flag and record the name of the component in error.
Error Notification¶
In general when the rtcSupervisor notices something has gone wrong it
calls the rtcSupervisor::InError method which updates the OLDB with the
error and an indication of the cause.
Mutex Usage¶
A std::mutex is available in the RtcSupervisor class which can be used
to globally lock the component.
This is used to avoid e.g. the monitor thread trying to “ping” the components when an activity is active.
As new extension points and the ability to add interfaces to the RtcSupervisor are added it will be necessary for programmers to make use of this facility.
Configuration¶
The supervisor has the following static parameters which define whether the associated activities are started in the supervised components in parallel or series.
cfg_static:
init_alone:
type: RtcBool
value: true
enable_alone:
type: RtcBool
value: true
disable_alone:
type: RtcBool
value: false
update_alone:
type: RtcBool
value: false
The supervisor needs to get a list of components which are supervised. As an INTERIM MEASURE, these are read from a DEPL table. It is likely that whatever component implements RTC deployment set starting/stopping will populate something similar. Any functionality regarding the usage of this DEPL table will be revisited.
The RTC Supervisor only reads the object_list attribute, the others
are used by the deployment component. The presence of the rtc_sup in the
object list is optional it is used by the DEPL component for deployment,
the supervisor skips it if found. If you call your rtcSupervisor
something else you will need to modify the rtcSupervisor to skip this
new name. Look at the code in the rtcObjectConfig.cpp with the comment
object_list:
type: RtcString
value: "rtc_sup object1 object2"
rtcSupervisor:
host:
type: RtcString
value: "localhost"
exe:
type: RtcString
value: "rtctkRtcSupervisor"
object1:
host:
type: RtcString
value: "localhost"
exe:
type: RtcString
value: "rtctkExampleComponent"
object2:
host:
type: RtcString
value: "localhost"
exe:
type: RtcString
value: "rtctkExampleComponent"
Deployment and Other Scripts¶
To aid testing the rtcSupervisor a simple deployment python script is
provided which accepts a file like DEPL.yaml and will deploy each of the
components identified in the object_list using the rtctkStartObject.sh
wrapper, passing the executable name and the component name.
The rtctkStartObject.sh script acts as a simple wrapper allowing all
RTC components to be identified by looking for the command line
rtctkStartObject.sh. The script launches the Object and waits for its
completion.
The rtctkRtcSuper_start_components.sh script copies some of the resources
into a “run” directory, does some cleanup and uses the deploy mechanism described
above.
The rtctkRtcSuper_stop_components.sh script just kills all the rtctkStartComponent
instances using killall
The rtctkRtcSuper_show_oldb.sh script provides a simple way of keeping an eye on the
fake oldb contents.
The rtctkSendCommand.sh script is a simple wrapper for the rtctkClient
passing the SDE file argument, the script assumes it can find this
file in a directory identified by $REPO_DIR
Todo¶
User extensions. Provide a mechanism for the users to add their own functionality for e.g. “InError” “SetMode”, “Run(thing)”.
RTR population. The Runtime Repo is not currently populated at init time.
SetMode, Mode setting by populating parts of the Runtime Repo currently not supported.