Integration Release R2.1

 

A tar gzipped file containing the integrated ALMA SW for R2.1 has been prepared and can be found here.

The software has been tagged with ALMA-R2_1

These are the patches we received and integrated subsystem per subsystem:   

 

SUBSYSTEM NAME

TAG (corresponding to the last bug fixing)

ACS

MONTHLY-2005-03-2

ARCHIVE

MONTHLY-2005-03-2

CONTROL

MONTHLY-2005-03-12

CORR

MONTHLY-2005-03-6

PIPELINE

MONTHLY-2005-03-5

SCHEDULING

MONTHLY-2005-03-4

TELCAL

MONTHLY-2005-03-8

ICD/HLA

MONTHLY-2005-03-1

a) General comments

b) Problems still to be investigated

c) How to improve the integration

d) Static and dynamic analysis

 

a) General comments

- R2.0 results have been confirmed and reproduced
- bulkdatda have been written in the archive
- multiple dynamic scheduling (with the same SB) is possible
- ids are correctly passed around
- calibration values as well
- connections among Data Capture, TelCal and QuickLook has been verified in the integrated environment and works.
- only Phase Calibration results can be processed
- Interactive sched OK from objexp, not implemented from EXEC OMC GUI

 

 

b) Problems still to be investigated

CORR
1. corrSimulator: kill -9 sometimes is not enough
2. some containers do not manage to shutdown properly (in particular CDP)
3. sometimes, a correct shutdown of the container does not manage to properly unload all kernel modules
4. from CCC.log we found recently on STE (not sytematically reproducible):
2005-05-19T11:54:46.776 CorrCanManager::buildLTANodesVec() -> No LTAs found
2005-05-19T11:54:46.779 CorrCanManager::buildSCCNodesVec() -> No SCCs found
2005-05-19T11:54:46.781 CorrCanManager::buildQCCNodesVec() -> No QCCs found

CONTROL
1. often, a correct shutdown of the container does not manage to properly unload all kernel modules and normally the following ones are still loaded:
rtai_fifos 21388 0
rtai_ksched 48729 0 [rtai_fifos]
rtai_hal 22220 1 [rtai_fifos rtai_ksched]

2. controlContainer crashing on second (or more) dynamic scheduling run.
Commenting out fts

#fts = Devices.getDeviceByNode('CONTROL_50');
#del(fts)

in $ACSROOT/bin/R2Test, solved the problem.

This will still need to be looked into; Ralph suggested that it may be based on the way ACS loads and unloads libraries.

3. DelayServerError:
DelayServer:: Error no specified source
is a known problem but does not impact anything else and is scheduled to be fixed in R3.


ARCHIVE
1. We had several instances of the cppContainer crashing; the crash was due to the BulkReceiver component. Holger is investigating.
2. Oracle tests still to be performed to check the latest ARCHIVE patch.

ACS
1. jlog: size of file to be loaded (see SPR ALMASW2005040)
2. manager stability
3. SPR ALMASW2005033

EXEC
1. OMC GUI too much memory used?
2. After dynamic is pressed the first time and the first SB has been processed, the user receives a feedback (green OK). From that point on, the green OK stays there, and nothing change pressing several times the dynamic button, processing multiple times the SB -> I think some feedback from the user is missing, which is obliged to check the events recieved to see whether the cycle is finished or not.
3. Still to be checked the ability of starting remote containers from the OMC GUI.

PIPELINE-SCHEDULING-ACS:
GUIs (like QLDisplay GUI and Interactive Scheduling GUI) do not show up when methods are activated from objexp started from command line rather than from acscommandcenter used to startup the system. This will be investigated in the following days.

OBSPREP
SPR ALMASW2005034

 

c) How to improve the integration


- For R3 we should foresee to have a group of developers (one per subsystem) to work together with ITS since the first days of the integration. My suggestion is to have a first group, f. i., in Garching for one week and a second group (with different people) few days later in Socorro.
- reproducibility: we lost a lot of time to verify results when these were not in agreement in Socorro vs. Garchig. We hope to remove this kind of problems from the list as soon as both sites will begin to use the STE.
- stability: when working with java GUIs, to open the GUIs from an Xserver emulation or to use the GUI remotely is not the most suitable way of doing. We will try to test two different ways: open GUIs locally and attach them to a remote manager; use webstart.
- We are trying to introduce more automation in the tests procedures

d) Static and dynamic analysis

The following static and dynamic analysis has been performed on the code tagged with: MONTHLY-2005-03-ITER-17.

The unit tests have been repeated many times, in particular also on the last tag ALMA-R2_1 and the results are consistent with the ones reported here.

1. Subsystems SLOC:

Data here reported are compared with previous release (September 2004 – RELEASE R2.0)

 

- on all the directories

- on src only

- on test only

- percentage test/src

 

SLOC detailed figures:

 

ACS

ARCHIVE

CONTROL

CORR

EXEC

OBSPREP

OFFLINE

PIPELINE

SCHEDULING

TELCAL

 

 

Note: Starting from Release R1.1 ITS and SE have used a common approach in calculating SLOC.

 

 

2. In-line Documentation

 

Assessment on the sufficiency of Doxygen-like in-line documentation: à Graph

 

 

3. Unit Tests

 

See the log at this link.

 

 

4. Test Coverage 

 

You will have a page with the results per each subsystem (see list below). In this page of results, there is a table with 3 columns:

- the first one lists the modules belonging to the subsystem,

- the second gives the results of the tests (PASSED, FAILED, UNDETERMINED, when it is neither PASSED nor FAILED); sometimes the test directory can be missing, so you will just see the message: "No test directory, nothing to do here".

- the third one gives a resume produced by Purify/Coverage reporting the analysis results on:

 

Functions

Functions and exits

Statement blocks

Implicit blocks

Decisions

Loops

 

The values reported for every item in the list above give the number of hits for that item.

 

In the same cell with the resume there is a link to the "Complete Report" produced by Purify. In the Complete Report one has information about the lines where the hit happened. For a loop, one has also the values: 0 (the loop is executed without entering into it), 1 (the loop is entered once), 2+ (the loop is entered and repeated once or more times).

 

Sometimes, instead of the resume, you will see a message like:

ERROR: No coverage info available
No atlout.spt

 

This happens when the modular test of that module is a fake one (for example the test is actually just an "echo something"), so there is no code (Java, Python or Cpp) that can be instrumented.

 

ACS

ARCHIVE

CONTROL

CORR

EXEC

OBSPREP

OFFLINE

PIPELINE

SCHEDULING

TELCAL

 

 

Note: the Test Coverage Analysis is not yet stable. Work is in progress to improve the results. You will see errors like:

 

This means that the test exists and is executed but Purify is not able to dump the information collected in the result file. This error is under investigation. We had 5 cases in ACS only

still under investigation, we don't know yet the reasons

 

 

 

We hope to clarify the problems under investigation as soon as possible!

 

TestCoverage Information

 

 

 

5. Java Duplicated Classes

 

See the list at this link.

 

 

6. Channels and Events

 

      See details at this link.