The upgrade of the Qaulity control [QC]
operational computing environment [dfo21 -- dfo33 and qc01 -- qc20]
from Scientific Linux
4.3, VLTSW-2007, DFS-5 [SL-43+VLTSW-2007+DFS-5] based systems to Scientific Linux
5.3, VLTSW-2010, DFS-6 [SL-53+VLTSW-2010+DFS-6] based systems, using
the same hardware, namely so-called Blace Center nodes, has been both a
long and long-awaited process. It was finally begun (from the QC+SOS
view) in Oct 2009 and represents a collaboration between:
- the VLTSW group -- who provide and test the Operating System [OS]
and Very Large Telescope Software [VLTSW] components
- the DFS group -- who provide and test the Data Flow Systems [DFS] components
- the SOS group -- who provide the hardware and make the OS and
software installations
- the QC group -- who test the integrated environment for QC
Operations and ultimately use the systems for QC operations
Operational machines at Paranal were upgraded to
SL-53+VLTSW-2010+DFS-6 at the start of P85, April 2010. QC dfoXX operational
machines are now set to begin upgrade at the beginning of August 2010.
The delay between the two is due to delays caused by all four groups.
Of course with such a significant upgrade in OS and software, a number of
things have changed and thus a certain number of adjustments need to
be made to QC files. DFOS software changes have already been
implemented transparently (i.e. the modified DFOS software works both
on under SL-43+VLTSW-2007+DFS-5 and SL-53+VLTSW-2010+DFS-6 based
systems.
The following changes to configuration files are provided in the hope
of being transparent, i.e. can be made before the upgrade is applied
and should work for both pre and post upgrade systems.
- crontab:
crontab -l > ${HOME}/crontab.SL-43+VLTSW-2007+DFS-5
- .qcrc:
- DFS_RELEASE: If you use:
export DFS_RELEASE=dfs
then there is nothing to change.
If you set DFS_RELEASE to a specific version, e.g.
export DFS_RELEASE=dfs-5_6_7
then add the following code immediately below this:
[ -d ~flowmgr/dfs-6_0_3 ] && export DFS_RELEASE=dfs-6_0_3
This will be ok until we have to upgrade the DFS version
on the new systems, at which point more definitive editing should be
made, e.g. comment out the DFS-5 line and simply set the DFS-6 line.
- Condor You should have something like:
source /opsw/packages/vultur/config/bashrc.vultur.private
export PATH=$PATH:/opsw/condor/bin:/opsw/condor/sbin
Replace with:
if [ -r /opsw/packages/vultur/config/bashrc.vultur.private ]; then
## This must be a pre DFS-6 system...
source /opsw/packages/vultur/config/bashrc.vultur.private
export PATH=$PATH:/opsw/condor/bin:/opsw/condor/sbin
elif [ -r ~flowmgr/${DFS_RELEASE}/config/bashrc.vultur.private ]; then
## This must be a DFS-6 or later system...
. ~flowmgr/${DFS_RELEASE}/config/bashrc.vultur.private
export PATH=$PATH:/opt/condor/bin:/opt/condor/sbin
fi
- XFCE
- .vnc/xstartup You should have something like:
startxfce &
Replace with
if which startxfce4 >/dev/null 2>&1 ; then
startxfce4 &
elif which startxfce >/dev/null 2>&1 ; then
startxfce &
fi
- Migrate old menu to new one:
export DFOS_FUNCTIONS=/home/uves/DFOS/bin/dfos.functions
/home/uves/DFOS/bin/dfos.migrate.xfceUserMenu2xfce4
If you haven't already done step 1 above it is too late. You must now
do steps 2 and 3 above if you have not yet already done so. Once you
have done so you can then proceed with:
- ssh to your upgraded dfoNN
- TMP_DIR:
For many of us the DFOS enviroment variable TMP_DIR is set to
something like /tmp/<inst> and since /tmp was
not migrated across, the TMP_DIR will not in general exist.
So do:
[ ! -d "${TMP_DIR}" ] && mkdir -pv ${TMP_DIR}
- /hsrmnt:
Replace any and all references to /hrsmnt/home with simply
/home in .qcrc, .dfosrc, .pecs/*
and anywhere else you may have made a reference to it... If you are
feeling brave and trust JP implicitly you could do:
sed -i 's|/hsrmnt/home|/home|g'
for each <file> you know or think it might be in, or for the
bold:
sed -i 's|/hsrmnt/home|/home|g' .qcrc .dfosrc .pecs/*
- LD_LIBRARY_PATH
Check if LD_LIBRARY_PATH includes
/vlt/FEB2007/NOCCS/lib. If so find in .qcrc,
.dfosrc, .pecs/* where this is set and remove it.
grep LD_LIBRARY_PATH.\*/vlt/FEB2007/NOCCS/lib .qcrc .dfosrc .pecs/*
- logout and then ssh to your upgraded dfoNN again
- Firefox: Start Firefox, do NOT click any of the DFOS
action buttons. Do the following to obtain the "expected" behaviours...
- To prevent firefox coming to the front when remote loading a file:
- In the URL tar type "about:config"
- Click Ok
- type "Diverted" in the filter field
- Double click on the "browser.tabs.loadDivertedInBackground" config key to set it to "true"
- To prevent firefox opening remote loaded pages in a new tab
(or window)
- In the URL tar type "about:config"
- Click Ok
- type "open_external" in the filter field
- Double click on the "browser.link.open_external" config key and set it to 1
- To prevent firefox opening clicked links in a new tab
(or window)
- In the URL tar type "about:config"
- Click Ok
- type "open_newwindow" in the filter field
- Double click on the "browser.link.open_newwindow" config key and set it to 1
- DFOS action buttons:
Before clicking on any DFOS action button, recreate the web page it is
on with the appropriate tool BEFORE doing so...
- dfoMonitor:
As a first check of basic health, run dfoMonitor once.
Now clicking on the DFOS action buttons should be OK.
- SciSoft: conflicts with the VLTSW provided MIDAS and java.
The best advise I think is don't use SciSoft, i.e. comment out (or
remove completely from .qcrc, .dfosrc and
.pecs/* any line that includes:
. /scisoft/bin/Setup.bash
or
source /scisoft/bin/Setup.bash
If you really do need SciSoft, then use (something like) the
following, In .qcrc:
saveMIDASHOME="${MIDASHOME}"
saveMIDVERS="${MIDVERS}"
## SciSoft...
if [ -f /scisoft/bin/Setup.bash ]; then
. /scisoft/bin/Setup.bash > /dev/null 2>&1
fi
export MIDASHOME="${saveMIDASHOME}"
export MIDVERS="${saveMIDVERS}"
And then at the very END of .qcrc
for P in $(echo $PATH | tr ":" "\n" | grep scisoft) ; do
export PATH=${PATH/:${P}}:${P}
done
- Python: pyqc was originally written to work with the Python
provided by SciSoft. Since version 1.2.1 (or maybe earlier) it work
with the QC customised Python version in /qcdp. If you have already
migrated to use /qcdp there is nothing to do. If not, now is the time
to do so.
- crontab:
crontab ${HOME}/crontab.SL-43+VLTSW-2007+DFS-5
- Resume normal operations
While practically every piece of software we use has been upgraded,
most are transparent to us, some however are not, here is what I have
found so far:
- Pipelines: Initially the same versions of the Pipelines
delivered to Paranal for April 2010 have been installed for the
SL53+VLTSW-2010+DFS-6 based systems.
-
The window manager sawfish is NOT available.
The upgrade procedure was first developed on Xen Virtual Machines.
Once the beta-procedure was established it was applied to dfo21 where
it was fine-tuned in a collaboration between Alexis Huxley of SOS and
John pritchard of QC. Once the final procedure was established it is
to be applied to the remaining DFO nodes, namely dfo22 -- dfo33.
The nodes will be upgraded between Aug 3rd and Aug 12th at a rate of
about 2 per day. SL-53+VLTSW-2010+DFS-6 based versions of each dfoNN
will be prepared offline (i.e. on available, identical hardware). Once
these are ready and in coordination with the QC scientist concerned, the
SL-43+VLTSW-2007+DFS-5 based version will be shutdown, the 1Tb /diska disk
removed from the SL-43+VLTSW-2007+DFS-5 based version and installed
into the SL-43+VLTSW-2007+DFS-5 based version and the the
SL-43+VLTSW-2007+DFS-5 based version will be started up. Unfortunately
the downtime is liable to be of the order of 2-3hrs as the 1Tb /diska
disks are almost certainly all overdue for file system checks, and
depending on the disk usage, this will take anywhere from a few mins
to perhaps as much as 3hrs.
In more detail, the procedure is:
- The day before the scheduled upgrade of dfoNN (to be announced
by SOS by email)
No impact on operations
- Prepare a new machine offline
- rsync /diskb onto /diska (the 1Tb disk)
- The Day
No operations
- 2mins: Shutdown dfoNN
- 30mins: Physically remove /diska from dfoNN and install it into the new dfoNN
- 5mins-3hrs: Power on the new dfoNN
- Probably /diska will require an file system check, this can take
a 1-3hrs, depending on the quantity of data on that disk.
- Release to QC scientist
dfoNN now ready for operations
- QC scientist, having already followed the procedures given in
the
Before Upgrade section above, reactivates cronjobs (see
After Upgrade) section above, and
operations resume.
|