The EU Processing Cluster

Policy

With the use of the EU Processing cluster you agree to follow the ALMA data access policy. In a nutshell:

  • ARC staff, including Contact Scientists and data reducers, are not allowed to download, reduce or disclose any data, including those of the project that they support, without explicit permission from the PI (registered on a Helpdesk ticket). The only exception is data download and reduction for the purpose of QA2 and then only for those projects that the member of the ARC staff has been formally assigned to.

  • The European Data Reduction Manager (DRM) or their deputy is responsible for assigning QA2 projects to data reducers. QA2 can only start after the name of the assigned data reducer is entered on the corresponding data reduction JIRA ticket.

  • No ARC staff performing QA2 at ESO or at one of the ARC nodes is allowed to disclose any intermediate or final data reduction products to anyone outside the European ARC network, including the PI and CoIs of the project of whom the data is being processed.

  • ARC staff shall not use any ALMA data for scientific purposes from projects for which they are not PI or CoI until the proprietary period has expired

Status

From within ESO, the status of the cluster can be seen at this link.

Access

The cluster gateway server is

   arcp1.hq.eso.org
It can only be seen from within ESO's network. Ssh into the gateway using your individual/visitor account.

Usage

The gateway server is only supposed to handle the logins and allow copying of files in and out of the cluster. For any other interactive work please request an interactive shell from the job scheduler:

   qsub -I -X -l mem=3g
This command is also available as the alias
   qi

This shell will be placed onto the freest available node assuring a well balanced use of the whole compute cluster. You can name your jobs by adding -N <jobname> to the qsub or qi commands.

In order to see which jobs are running, use the

   qstat -r
command. To see which are queued
   qstat
and in order to see the full context of the scheduling of a given job run e.g.
   qstat -f 1000.arcp1
where 1000.arcp1 is the id of the job. This id is returned upon job submission and is also returned by qstat.

Directories

In addition to the home directory, each user has access to the workspace at
   /opsw/work/<username>

Please only use this directory for data.The home directory is intended to hold the personal configuration (e.g. .bashrc) and possibly python scripts but no data.

The "work" alias command takes you to your workspace. The disk space on the lustre filesystem holding also the workspspace is shared amongst all users. Data should be removed therefore as soon as it is not needed any more. There is no backup available on that space, please consider it as being scratch space.

Enviroment

The environment for the use of CASA, the additional analyis software, the pipleine etc. is sourced in your .bashrc automatically. This allows us to adapt the settings to the changing environment without the users having to change their .bashrc.

VNC access

If you access the cluster via a high-latency connection you might want to start a virtual server (vnc) on the cluster and connect to that server over ssh. This also has the advantage, that jobs that have been started can continue to run even if you disconnect (e.g. with your laptop during travels). In order to start the vncserver (if none is running for you) and to get the commands to be used for the ssh tunneling, run

   getvnc.sh [-geometry WIDTHxHEIGHT]

Note that this command does not start a second server if one is running already. So the command can be run anytime in order to get the tunnel and connection commands.Optionally a geometry can be given, e.g. to obtain a larger window if the vncserver is accessed from a larger screen.

Once that command is run on the cluster gateway arcp1, you have to run the other two commands (one for starting a ssh tunnel and one to connect the vncserver) on your external machine (e.g. laptop). For Mac users, "Chicken of the VNC" can for example be used instead of vncviewer.

If you do not need your vncserver any more, please run

 
   getvnc.sh -k  

CASA

Once you are logged in, you can start CASA:

casapy will start the QA2 default (presently 3.4)
casapy-4.0 will start CASA 4.0
casapy-x.y will generally start CASA x.y if available
casapy-test will start the latest test distribution
casapy-stable will start the latest stable distribution

If you run more than one casa session on the cluster, it still makes sense to separate the CASA resource directories (NOTE: all nodes access the same lustre file system!). Therefore, the CASA wrapper script stays in place. It will now prompt you for a session ID.

Job submission

In addition to the use of interactive shells on the cluster, jobs, including those requiring X windows, can be submitted to the batch queue from the gateway server arcp1:
   qsub -q batch -X -l mem=3g torquejobscript.sh
This command is also available through the alias
   qb torquejobscript.sh
Here torquejobscript.sh is a shell script to be executed. Run
   man qsub
to get a full description on what these job scripts can look like. A very complete description of qsub can be found here. Batch jobs using graphics need to connect to a virtual X Windows Frame Buffer. Such a frame buffer is running on each host at "localhost:999" on screen 0.

This is an example, executing "myscript.py" in CASA,:

   #! /bin/bash

   # The name of the job as it will show up when running qstat and in the subjects of the emails sent to you
   #PBS -N my_casa_job

   # Email address to which emails shall be sent
   #PBS -M <YOUR_USER_NAME>@eso.org

   # Get email notifications at "abort", "beginning" and "end"
   #PBS -m abe

   # Write the logfiles of the job to
   #PBS -o /opsw/work/<YOUR_USER_NAME>/torque
   
   # Merge the stdout and the stderr into one file
   #PBS -j oe

   echo "Starting job at `date +'%Y-%m-%d %H:%M:%S'`"

   
   # Source the environment
   source /home/`whoami`/.bash_profile
   source /home/`whoami`/.bashrc

   # Set the display to the virtual frame buffer
   export DISPLAY=localhost:999 

   # Change to the directory in which the job was submitted
   cd $PBS_O_WORKDIR
   
   echo "Running on host `uname -n`"
   echo "Using workdir $PBS_O_WORKDIR"
   
   # Run myscript.py in CASA
   # Note that the output of the actual job is redirected to the logfile directory. The reason is, that the logfiles from torque
   # only get written out at the very end of the job. Redirecting the output allows to follow the job during the batch execution.
   casapy-stable --nologger --nogui -c "myscript.py" > /opsw/work/`whoami`/${PBS_JOBNAME}_${PBS_JOBID}.log 2>&1

   echo "Done at `date +'%Y-%m-%d %H:%M:%S'`"

-- FelixStoehr - 26 Nov 2012

Topic revision: r11 - 22 May 2014 - 11:31:03 - FelixStoehr
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding ARC TWiki? Send feedback