Bulk Data

ACS provides support for the transport of bulk data [RD01 - 7.1. Bulk data transfer], to be used for science data.

The following use cases are supported:

Push streaming. Connection initiated by the supplier.

A data supplier produces a continuous stream of data for a specific bulk data consumerComponent. Typical example is the Correlator sending continuous streams of data to the Archive.

The basic operations are as follows:

Push discrete bulk data. Connection initiated by the supplier

A data supplier produces bulks of data as separate, non continuous streaming, entities. Typical example is a data processing Component sending images to another Component for further processing. Everything works like in the previous case, but for the fact that the communication protocol must be able to identify separate bulks of data, like a single image.

Pull data from stream. Connection initiated by consumer.

An application needs streaming data published by a Stream Provider Component. There can be multiple clients. Typical example is a GUI connecting to a CCD camera. Multiple GUIs can display the image from the same Camera.

The basic operations are a follows:

Pull discrete bulk data. Connection initiated by the consumer.

An application needs to retrieve discrete bulks of data as separate, non continuous streaming, entities. Typical example is retrieval of images from the archive. Everything works like in the previous case, but for the fact that the communication protocol must be able to identify separate bulks of data, like a single image.

Performance considerations

It shall be possible to handle the actual transfer of data with communication protocols more efficient than CORBA IIOP, in particular for high volume streams.

When there are multiple clients for the data published we have implemented a service architecture in which the data supplier sends the data to just one Distributor which, in turn, sends them to a number of connecting clients. This decouples the load due to the increasing number of clients from the supplier.

Precise performance requirements have been collected and verified with the ACS Bulk Data system implementation based on the CORBA Audio Video streaming service. Tests have demonstrated that the ACS implementation can deliver 700 – 800 Mbit/s to three consumers simultaneously.

Architecture and design

Bulk data transfer can be implemented in CORBA using three techniques:

  1. Iterators on normal IDL methods

  2. Notification Channel

  3. Audio Video Streaming Service

The first two options are based on the IIOP transport protocol and therefore suffer from performance limitations, although tests available in the literature show that properly designed buffering limits these problems. Option 1 is better suited for discrete bulk data while option 2 is better suited for streaming.

Option 3 is based on the Audio Video Streaming Service [RD42] defined by CORBA. This specification aims at the streaming and transfer of large amounts of data and satisfies the requirements expressed in [RD01 - 7.1.1 Image pipeline]. The handshaking protocol is defined using CORBA IDL Media Control interfaces, but the actual data transfer goes out of band and does not use (but could use) CORBA to transport data. TAO provides an implementation and provides transport over TCP and UDP with excellent performance[RD43].

The CORBA Audio Video Streaming Service supports all the use cases described above.

There are the following basic concepts:
Data published on a Flow is received via a callback implemented by the receiving Component.
The push use cases are implemented using an upload flow. The pull use cases are implemented using a download flow.
The Bulk Data Components implement the Media Control interfaces (and extend them as needed).
Streaming use cases are implemented using the start control commands on the Component to notify that the stream has started and not sending the stop command until the stream needs to be closed. Structuring of the data sent in the stream or the SFP protocol are used for synchronization and framing of the message.
Discrete use cases are implemented using the start and stop stream commands to identify each discrete piece of bulk data. These will be wrapped in convenience interfaces. In this way it is not necessary to inspect the incoming data to identify the frame boundary and the end of data.

Limitations of the CORBA Audio Video Streaming Service

There is no implementation of CORBA A/V service for Java or Python. The producers and consumers of the bulk data stream (Correlator, Control, TelCal, Pipeline and Archive) all use C++ implementations of the bulk data senders and receivers as ACS components. Java and Python applications can access these components via the normal ACS/CORBA interfaces. There is therefore no requirement for implementation of CORBA A/V in languages other than C++. For example, image data processing typically takes place between C++ Components without requiring high performance image transfer to Java or Python Components.