Release R1.1 Report

 

 

·        Integration Cycle

·        Integration Problems

·        Subsystem test in isolation

·        ARCHIVE

·         EXEC

·        OBSPREP

·        TELCAL

·        PIPELINE

·        CORRELATOR

·        Code Review

·        Logs

 

 

 

Integration Cycle

The integration cycle among subsystems tested by ITS can be summarized by this picture:

 

Via the ALMA-OT batch program a project, made up of four SBs, has been created and stored in the ARCHIVE.  The execmaster GUI is than used to start the entire cycle and to trigger the start up of the Master Scheduler. The Scheduler reads the archive to collect information about SBs. Each SB is sent to the Controller and CONTROL sends back SB start/end events to SCHEDULING. CONTROL also starts the telescope calibration sending Scan/Obs events to TELCAL. Data reduction is then performed with CORR in simulation (in this case TELCAL uses test FITS files prepared by TELCAL itself) or not (in this case CORR sends test data, still simulated because of lack of the hardaware). At the end of the data reduction a PointingReduce event is sent from TELCAL to SCHEDULING and results are written in the ARCHIVE. SCHEDULING stores ppr and passes it to PIPELINE. At this point PIPELINE updates the ppr and stores it in the ARCHIVE. At the end, when the entire project has been processed, PIPELINE sends an end event to SCHEDULING.

 

 

Integration Problems

The integration test discussed above has highlighted different problems most of them already solved.

A summary of the issues is here reported.

 

1) DelayServer problem 

The activation of the DelayServer component in CONTROL triggered CORR that started to write in the bukstore 300 data as foreseen.

Unfortunately data were not phisically stored in the bulkstore and this issue has to be inspected by ARCHIVE-CORRELATOR together.

See the CPP Container Log and the Java Container Log

After that it was impossible to close the loop. No more actions happened as if CONTROL was attending an answer. This problem was not investigated due to time constraints for the R1.1 integration test delivery. To finish the cycle without problem, the references to the DelayServer in the Control software (in the file ArrayControllerImpl.java) have been commented out. This is an open issue that should be solved as soon as possible for next release. We’ll submit a proper SPR.

 

2) prefix/namespaces duplication 

This issue occurred during the attempt to read SBs, stored in the ARCHIVE, from SCHEDULING. Even if SBs were present, it was impossible to retrieve them. In the schema files, which were imported into the Archive with the archiveLoadSchema command, there were "conflicting" namespace definitions. Some of the schemas defined the namespace prefix "sbl" for the namespace "Alma/TestSchedBlock", others for "Alma/ObsPrep/SchedBlock". When archiveLoadSchema is called a table is established. It associates namespace prefixes (e.g. "sbl") with namespaces (e.g. "Alma/ObsPrep/SchedBlock"). This table is used for queries. In a query "/sbl:*" the prefix "sbl" is replaced by the corresponding namespace stored in the table. If there are two or more different namespaces for the *same* namespace prefix, the query might behave differently than expected.

 

The problem was solved changing the namespaces prefix on the xsd files that were duplicating the information. No special mechanisms is needed to enforce the uniqueness of namespace prefixes since there will be no or few new schemas in the future.

 

3) Wrong programming style

The subsystem SCHEDULING couldn’t get the array controller.

The error in getting the ArrayController component was due to the hard coded container name "hugoActivator" in the CONTROL getDynamicComponent function.
As a matter of fact in the ITS CDB the previous container, hugoActivator, is not defined.
It is not recommended to hardcode container names because it prevents future developments and reduces the code integration in different environments such as, for example, the testing one.

The problem was solved leaving the Manager the flexibility to assign dynamic components to available containers.
Manager applies the best match with the definitions contained in the Component.xml file on which the component ArrayController has been re-defined as below reported:


           <_ Name="*"
                              Code="alma.Control.arrayInterfaces.ArrayControllerHelper"
                             Type="IDL:alma/Control/ArrayController:1.0"
                             Container="frodoContainer"/>



For this reason the container name *should not be hard coded* in the function.

See the following examples for C++/Java code

Java
===

      Bad style

 

       Integer count = new Integer(m_activeControllers.size());
       String name = new String("ArrayController" + count);
       String idl = new String("IDL:alma/Control/ArrayController:1.0");
       String impl = new 

       String cont = new String("hugoActivator");
      
       ComponentSpec cs = new ComponentSpec(name, idl, impl, cont);
      
       try {
              arrayCont = (ArrayController) (ACSComponentHelper.narrow
              (m_containerServices.getDynamicComponent(cs, false)));
       }
       catch (Exception e) {
               m_logger.severe("Failed to obtain ArrayController COB.");
       }
    

      Good Style

      

       Integer count = new Integer(m_activeControllers.size());
       String name = new String("ArrayController" + count);
       String idl = new String("IDL:alma/Control/ArrayController:1.0");
       ComponentQueryDescriptor cs = new ComponentQueryDescriptor(name, idl);
      
       try {
              arrayCont = alma.Control.ArrayControllerHelper.narrow
              (m_containerServices.getDynamicComponent(cs, false));
       }
       catch (Exception e) {
               m_logger.severe("Failed to obtain ArrayController COB.");
       }


C++
===

      Good Style


       ComponentSpec_var cSpec = new ComponentSpec();
       cSpec->component_name = CORBA::string_dup("ArrayController");    //name of the component
       cSpec->component_type = CORBA::string_dup("IDL:alma/ControlArrayController:1.0");    //IDL interface implemented by the component
       cSpec->component_code = CORBA::string_dup(COMPONENT_SPEC_ANY);     //executable code for the component (e.g. DLL)
       cSpec->container_name = CORBA::string_dup(COMPONENT_SPEC_ANY);     //container where the component is deployed

        //The IDL ComponentInfo structure returned by the get_dynamic_component method
        //contains tons of information about the newly created component and the most important
        //field is "reference" (i.e., the unnarrowed dynamic component).
        ComponentInfo_var cInfo  = client.manager()->get_dynamic_component(client.handle(),  //Must pass the client's handle
                                                                           cSpec.in(),    //Pass the component specifications
                                                                           false);    //Inform manager this component is NOT the default for it's type!
 

4) Wrong way to access system variables

During CORR test became manifest that the INTROOT system variable was read and used by the program.

 

May 25, 2004 9:53:57 AM alma.acs.container.ContainerSealant invoke
INFO: intercepted a call to 'DelayServer#calcInit'...
Error unable to find EOP File:
     Most likely you have failed to redefine
     the Introot variable in DelayServerImpl
     this is a temporary fix until the DB is
     implemented. (JSK 09/18/04)
Exception
java.io.FileNotFoundException: null/lib/iers_bulletina.xvii_003 (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileInputStream.<init>(FileInputStream.java:66)
        at java.io.FileReader.<init>(FileReader.java:41)
        at alma.Control.DelayServerImpl.EOPParser.<init>(EOPParser.java:19)
        at alma.Control.DelayServerImpl.DelayServerImpl.calcInit(DelayServerImpl.java:132)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 
 
 
 
 error due to this code line
 
        EOPParser parser = new EOPParser(getIntroot()+
                         "/lib/iers_bulletina.xvii_003");
 
 

In our ITS environment *$INTROOT is not set* and this was the reason of the error. Note that INTROOT is used for development only.

 

Here there are two problems. The fist is the access to system variables and the second is the file location. Since it is not possible directly from Java to read system variables the trick here used was to call a function defined in a cpp file.

Nevertheless this type of mixed codes leads to not complete independent software and should be avoided whenever possible.

 

1) INTROOT/ACSROOT versus ACSDATA

INTROOT is in itself a legal place to install things, if you keep in mind that things have to be locatable transparently in INTROOT, [INTLIST...], ACSROOT

 

But there is a conceptual difference:

 

- INTROOT... is for static things, that do not change and are part of the installation   INTROOT.... is sharable between different machines, NFS mounted.

- ACSDATA is for host dependent or time dependent files and CANNOT be shared between different hosts.    Files that change with time (like seems to me that specific file is) or that can be edited (like CDB) should go here.

 

***IMPORTANT***:

Whenever you have to install a file, you have to think about the purpose and decide if it should go from the logical point of view in one or the other place. Looking at that specific file with Moreno, we thought the best place was ACSDATA, but we might also be wrong.

The important is that we thought in those logical terms.

 

2) INTROOT in Java

 

Java does not support environment variables BY DESIGN. I think it is therefore a strong violation to circumvent the problem using JNI to write a C routine that reads environment variables.

 

The "Java way" is to use properties and resources.

A property should have probably been used in that case (as done by the Archive). Java also provides native mechanisms to search for resources in the class path. This is a preferred mechanism, better than define a property, but I think I understood from the code that it was not easily doable in the specific case.

 

***IMPORTANT***:

- Use the features of Java to locate resources in the class path whenever possible.

- Add properties for some specific case

- Never use environment variables in Java.

 

3) Search INTROOT.....

 

ACS provides standard functions in C++ and Python to find a file in INTROOT...ACSROOT. Because of the reasons just described, there is no equivalent in Java.

 

***IMPORTANT***:

These functions should be used instead of "hard-coding" opening INTROOT. This is important because if we change the search algorithm (adding INTLIST) your code will continue working.

 

In Java: for retrieving the directory path where a file is stored the following method can be used

/ACS/LGPL/CommonSoftware/acsjlog/src/alma/acs/logging/ClientLogManager.java   line 442
            is = getClass().getResourceAsStream("/"+DEFAULT_LOG_PROPERTY_FILE);

 
 
 
 


 

where DEFAULT_LOG_PROPERTY_FILE is the searched file name.

*The file is searched in all the CLASSPATH dirs*. The method returns the path where the file has been found.

Similar methods also exist in other languages.
For completeness they are respectively:

 
 
 
 
 
acsutilFindfile   for c++
findfile.py         for python.

 
 
Note that, whenever the INTROOT has to be used, always a check on its definition and existence has to be performed. Something like: 
If INTROOT is defined -> take INTROOT
If not, if ACSROOT is defined -> take ACSROOT
If not -> error
 
 
5) Channel problems 

This type of problems came out during the events exchange among CONTROL, SCHEDULING and TELCAL. CONTROL was sending three different types of events on three different channels while TELCAL was listening on a single channel the 3 events. It seems that the channel and events specification were not properly understood or at least not well shared among subsystems.

The information contained in the CONTROL/ws/idl/ControlInterfaces.idl 
 
//! Name of the event notification channel used by Control 
  const string CHANNELNAME = "CONTROLSYSTEM"; 
//! Event type list. 
/*! 
    This is a list of the basic event types which Control will send 
    on its notification channel at different times. 
*/ 
  const string OBSEVENTS   = "OBSERVATIONEVENTS"; 
  const string SCANEVENTS  = "SCANEVENTS"; 
  const string EXECEVENTS  = "EXECEVENTS"; 
  const string SYSEVENTS   = "SYSEVENTS"; 
  const string DELAYEVENTS = "DELAYEVENTS"; 

 
 
 
 


 
 
 
 
 
size:8.0pt; font-family:Arial'> 
 

contradicts the implementation. In the idl file it is used only one channel and three different events while in the CONTROL code the Supplier was using three channels. Furthermore the last part of the idl file, describing the events type, is not necessary and in this case it was referring to obsolete events names. .

 

The entire integration test was performed

 

1.      removing the 5 obsolete idl “const string” event definitions

2.      substituting in all ALMA code the uppercase name string with the event name as defined in the idl file (ExecBlockEvent, ScanEvent, SendEvent, …)

3.      sending, from CONTROL, all events in only one channel (CONTROLSYSTEM)

 

This issue shows that:

-         the idl files are the crucial point where developers can get information. They have to be defined with the highest care!

-         again communication among subsystems should be improved; ITS is willing to help in this sense. For this, beginning of March ITS prepared the page http://almasw.hq.eso.org/almasw/bin/view/ITS/ROneDotOneIntegrationTest inviting everybody to contribute. There it is at least clearly stated that CONTROL was using one channel name only. Unfortunately, it seems that that page was basically ignored (or everybody only read the part related to his/her own subsystem). Whatever other suggestions to improve communication is welcome. 

 

 

6) Events not read from execmaster GUI

 This problem still has to be clarified but it seems that execmaster GUI is not able to listen to event maybe due to a code problem. EXEC people should send a mail with explanations.

 

Subsystem test in isolation

The following tests were performed in isolation from the rest of the environment excluding, as much as possible, interaction with the other subsystems.

 

For the entire list of tests and followed procedure see:

http://almasw.hq.eso.org/almasw/bin/view/ITS/ROneDotOneIntegrationTest

 

ARCHIVE

Most of the problems found out in March were fixed. See the list reported by Simon.

 

Two scripts still have problems and need to be corrected (one of them has already been fixed)

 

  • ArchiveStartManager (fixed)
  • ArchiveStopManager

 

One problem, quite important, is still pending.

It seems that in the ArchiveManager GUI I the UIDs set is not matching data archived in the Microarchive. It is like the ArchiveManager GUI is pointing to somewhere else.

Even after having cleaned the archive (archiveCleanXindice), stopped and restarted the archive manager in the GUI are present these information:

 
> Refresh      0      1      2      3      4      5
> uid://X0000000000000138/X00000002    U    R  
> 2004-03-25T07:53:49.047    OT-User    R    W    D 
> 
> uid://X0000000000000138/X00000003    U    R  
> 2004-03-25T07:53:49.073    OT-User    R    W    D
> 
> uid://X0000000000000138/X00000004    U    R
> 2004-03-25T07:53:49.094    OT-User    R    W    D 
> 
> uid://X0000000000000138/X00000005    U    R
> 2004-03-25T07:53:49.112    OT-User    R    W    D
> 
 

This point has to be clarified and fixed.

 

Fixes available for ACS 3.1

 

Browse Mode

1Q) How to look for a specific UID? Is it possible?

 

1A)The error shown has been fixed.

 

2Q) The use of blank schemas does not produce error messages

2A) If you put in a blank schema now there will be an error shown by the number tab turning red.

 

3Q) All the frames are not cleaned from the previous query as it should or/and the query does not work properly.

 

3A) The list of Entities in the top left hand screen is not supposed to be cleared. This is deliberate and it states in the documentation that results are added to the list. If you want a clean list either use another list or use clear first.

 

I accept that keeping the areas that display document information consistent is not always ideal, implementing anything else is non-trivial in the context of a web application.

 

This has been fixed.

 

 

Schema mode

0Q) Selection of a schema produces and error on the bottom right frame.

0A) This has been fixed

 

1Q) Adding a namespace produces an error

HTTP Status 503 - Servlet getnamespace is currently unavailable

1A) This has been fixed

 

2Q) Removal of Namespace from a specified Schema produce the error above described.

2A) See above

 

3Q)The insertion of a new namespace is not immediately mapped to the above schema list and a reload/refresh of the schema page is required

3A) Automatic refresh of the page is non-trivial

 

4Q) The insertion/deletion of namespaces is working fine with Explorer 6.0 but not with mozilla 1.4. With mozilla even after a refresh I can not see the added/deleted namespaces.

4A) I haven't seen this problem

 

5Q) It is not clear to me what the “withdraw” option should do.

5A) The new documentation is more complete

 

6Q) Even if documented it is misleading to see that there is the possibility to select “remove” for a specific namespace and Schema and later realize that all the namespaces with that name have been deleted from all the schemas. It would be better to “disable” the possibility to choose a Schema with the option “remove” namespace.

6A) This is related to the architecture of the archive and not the

ArchiveManager

 

7Q) STore option produce HTTP status 404

7A) This has been fixed

 

EXEC

UserAdmin GUI: SPR #ALMASW20040039

 

ExecMaster GUI: only one strange behavior about the first instance of it. It never works properly and all components collections are displayed as not OK (red icon).

 

 

Stress Test

--------------

 

Component distribution:

            - Manager started on te78

            - 34 instances of the execmaster GUI started on te49

 

For te78 everything was OK.

For te49 there were only problems related to system performance but not due to the execmaster application.
At the 15th execmaster instance the GUI became not useable and completely gray. At the 34th instance there was and Out of Memory problem (te49 with 1GB RAM).

 
 

OBSPREP

Alma OT GUI.

 

*** Test using different users

1 intusr and 1 testusr.

No problems in updating and loading the same project. Of course the lock mechanism is still missing and this can create problems.

 

 

*** Only 1 minor thing.

The search option does not work for only 1 word or part of it. The user has to provide the entire PI name in the panel. Not really useful.

 

 

TELCAL

Nothing to be mentioned.

 

 

PIPELINE 

Note: pipeline is a subsystem that does not work independently from other subsystems, it is event driven and so needs the input from other subsystems to work.

SciencePipeline was driven by scheduling processing a SB and generating a ppr. Logs indicated that the science pipeline component went from initializing to initialized to operational.  The sciencepipeline notification chanel was created as well.

On object explorer the SceincePipeline component was available and under Science Pipeline there is helloPipeline, processRequest and getStatus: which could be invoked from Object Explorer.

QuickLook was exercised with the testscripts 4,5,6 from the QuickLook/test directory. 4&5 passed but 6 produced the following error:
1 -     QuickLookResult stored
1 - QLR_CLIENT: Exception occured
1 - org.omg.CORBA.UNKNOWN: This exception was reported by the server, it is only re-thrown here.  vmcid: 0x0  minor code: 0  completed: No
1 -     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)1 -     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
1 -     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
1 -     at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
1 -     at org.jacorb.orb.SystemExceptionHelper.read(Unknown Source)
1 -     at org.jacorb.orb.connection.ReplyInputStream.checkExceptions(Unknown Source)
1 -     at org.jacorb.orb.Delegate.invoke(Unknown Source)
1 -     at org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:457)
1 -     at alma.xmlstore._OperationalStub.query(_OperationalStub.java:483)
1 -     at alma.pipeline.test.QuickLookResultClient.findQuickLookResult(QuickLookResultClient.java:265)
1 -     at alma.pipeline.test.QuickLookResultClient.main(QuickLookResultClient.java:439).

 

CORRELATOR

Preconditions:

  1. What components must be loaded before CORR can be started?
    • BULKSTORE
    • ControlSystem
  2. What notification channels must exist before CORR can be started?
    • CONTROLSYSTEM

Loading:

  1. How do we load the CORR components?
    • List the following components in the CDB as auto-load components:
      • CONFIGURATION_VALIDATOR
      • CDP_CONTROL
      • CCC_MONITOR
      • OBSERVATION_CONTROL_SIM
    • Start ACS and the C++ and Java containers.
  2. How do we verify CORR is loaded?
    • CORR will create CORRIntegrationEventChannel.
    • For each CORR component, check log for "Component XXX activated" messages.
      • Question: What log file?
      • The logging channel -- just use jlog to view the logs -- JimPisano - 19 May 2004
    • As CORR receives DELAYEVENTS, it logs the start time, stop time, and antenna IDs of the events..
      • Question: Is this true even when CORR is not responding to a configureObservation() message?
      • Yes -- JimPisano - 19 May 2004

Testing:

  1. How do we start a CORR test?
    • Send configureObservation() to OBSERVATION_CONTROL_SIM.
    • Note: this is normally done by ControlSystem.
  2. How do we verify the CORR has started?
    • Once per second, CORR will publish a Correlator::integrationEvent_t (see IntegrationEvent.idl) on CORRIntegrationEventChannel.
    • Once per second, CORR will publish channel average data to BULKSTORE in VOTable format. For this release, this is simulated data.
    • Note: An observation lasts 5 minutes. At the end of this time there should have been 300 integration events on CORRIntegrationEventChannel and 300 channel average data records added to BULKSTORE.
  3. How do we verify the CORR is sending data to it's clients?
    • Question: What subsystems subscribe to CORRIntegrationEventChannel?
    • According to ROneDotOneIntegrationTest, no subsystems subscribe to this notification channel.
    • In R1, Executive subscribed to this channel and showed in its GUI that it received the events -- JimPisano - 19 May 2004

Cleanup:

  1. How do we stop CORR?
    • Shut down the C++ container containing CORR components.
  2. How do we verify the CORR is stopped?
    • For each CORR component, check log for "Component XXX etherialized" messages.
      • Question: What log file?
      • The logging channel -- just use jlog to view the logs -- JimPisano - 19 May 2004

Verification:

  1. How do we verify the logged DELAYEVENTS are correct?
    • Compare the events logged on the CONTROLSYSTEM notification channel with the events logged in the CORR log.
  2. How do we verify the VOTables stored in BULKSTORE are correct?
    • Question: What is the form and value of the simulated data?
    • You will need to browse the BulkStore. I was told that "The actual contents didn't matter", consequently there may be a lot of zeros and non-changing values between integrations. I can give you some information and there is a Python script in the Archive wiki (http://almasw.hq.eso.org/almasw/bin/view/Archive/VOTableDataTransfer) which can parse the VOTable output for inspection. -- JimPisano - 19 May 2004
    • Thanks for the link.
    • Question: For now, should we assume this is correct if the number of records in BULKSTORE is 300 times the number of calls to configureObservation()? -- ScottRankin - 19 May 2004
  3. How do we verify the events on CORRIntegrationEventChannel are correct?
    • Question: How do we link these events to VOTable records?
    • There is no way to verify this link as far as I know. This is again due to not caring about the actual contents. Perhaps I can find out more information about the actual content. -- JimPisano - 19 May 2004
    • Question: For now, should we assume this is correct if the number of logged integration events is 300 times the number of calls to configureObservation()? -- ScottRankin - 19 May 2004
    • Yes, this is a good assumption. -- JimPisano - 19 May 2004

 

 

Code Review

The integration test has pointed out that there are different programming styles in the ALMA code to the detriment of an easy maintenance.

 

The system should be more uniform adopting similar programming style. It is enough to see the number of ALMA coding standards violation in SE pages to understand that this effort is not well pursed.

 

A better communication/cooperation among subsystems will help everybody in reaching the target (see channel problem).

 

 

 

Logs

The java and cpp container logs are here linked:

 

 

The problem in the cpp log

 

TelCal trace : AlmaTiFitsReader::AlmaTiFitsReader(filename = ../../TelCalResults/Engines/test/testTelCal002.fits).

Could not open ../../TelCalResults/Engines/test/testTelCal002.fits

 

is similar to the one discussed above about the right dir where to place files and how to retrieve their path. In this case the file is searched starting from /TELCAL/TelCalResults/Engines/test dir. See the discussion in the "wrong way to access system variables" section.