Cluster

The sun HPC cluster consists of 24 computing nodes. Two different types of nodes are used:

This makes a total of 640 processors at 1280GB RAM.

Connected using InfiniBand:

FIXME

The Operating System used is Ubuntu based Qlustar. Qlustar has a very useful usage Manual here.

Both, cluster and workstation are booting os-images over the network. The base for both images are the same. For workstations additional packages (i.e. for Desktop environment) are added to the os-images. See Workstations for Details.

Both, cluster and workstation OSes mount central file systems over the network, making them available by the exact same way to the user from every workstation as well as from the computing nodes.

Login Nodes

sun - Spectacular Userlogin Node

This is the main access to the cluster: Login using ssh at

username@sun.iek.fz-juelich.de

Use it to manage your jobs in the queue and for similar management tasks.

NOTE: from sun you can login to the 24 compute nodes sun-01sun-24 using ssh. This can be useful to debug errors directly on the nodes.

To transfer data do /data/ please use dam! To transfer data to /home/ please use fire!

dam - Data Access and analysis Machine

This is the access directly to the machine which holds the /data folder. Login via

username@dam.iek.fz-juelich.de

Use this node if you

  • want to run analysis directly on the data in /data/ or
  • want to upload/download a lot of data to/from /data/ or generally for data transfer
  • want to use SFTP to access your data!

Don't use this for building or running simulation software. Development libraries are not available.

fire - Also a machine

This is the access directly to the machine which holds the /home and /apps folder. Login via

username@fire.iek.fz-juelich.de

Don't use this for building or running simulation software. Development libraries are not available.

Folders

There are three special folders on all the login nodes and workstations in Nürnberg:

folder served byuse for user-quota
$WORK or /data/$USER dam simulation/analysis data 200G
$HOME or /home/$USER fire your regular home folder: documents, day-to-day, etc. 20G
/apps fire applications, binaries, compilers not in the OS-repos1) -

Be aware that, by default all other users can read and list contents of your directories. If you don't want this, you have to change permissions for your files yourself using chmod.

Backup snapshots

Under /homesnaps/ you find 8 subfolders containing snapshots of the /home folder

  • 4 from the last 24 hours (hourly-1hourly-4, i.e. at 8:00, 12:00, 16:00 and 20:00) and
  • 4 from the last 4 days (daily-1daily-4, i.e. at 22:00).

The higher the number the older the snapshot.

Job Queue

slurm is used as a job queue manager. On our cluster, version 15.08.12 is running. For users, slurm has 5 central commands:

  • srun – run a parallel job directly (i.e. replacement for mpirun)
  • sbatch – submit a jobscript to the queue
  • squeue – get information about the queue
  • sinfo – get information about slurm nodes and partitions)
  • smap – get a graphical representation of queue and cluster
  • scontrol – control utility (especially useful are `scontrol update`, `scontrol show` or `scontrol help`)
  • sprio – show priority of pending jobs. *The larger the number, the higher the job will be positioned in the queue, and the sooner the job will be scheduled.*

Example Jobscript

This is an nonsensical example jobscript you would submit with sbatch. The lines starting with #SBATCH specify parameters for sbatch that are overwritten by commandline parameters. The most important ones are

  • -o – output file (%j,%N are replaced with jobid, masternode respectively)
  • -J – jobname
  • -n – number of processors (equivalent of -np for mpirun)

It is advisable to use srun inside the jobscript instead of mpirun.

jobscript.sh
#!/bin/bash
#SBATCH -o job.%j.%N.out
#SBATCH -J YourJobName
#SBATCH --get-user-env
#SBATCH -n 64
#SBATCH --time=08:00:00
 
YOUR_ENVIRONMENT_VARIABLE="Value of your environment variable"
 
srun your-parallel-executable --arguments to --your parallel --executable

Partitions

The cluster has 3 available computing partitions which differ in maximum number of used computing nodes and wall-time. Select the partition you want with –partition=<name> or -p <name>. Default is long.

PartitionMax NodesWall Time
long 12 24h
big 24 6h
small 3 none

Accounting

Slurm keeps track of the used processing time for your jobs. Related commands are

  • sacct – displays accounting data for all jobs and job steps in the Slurm job accounting database
  • sreport – generates reports from slurm accounting data

For example

sreport user top start=1/1/16 end=12/12/16

gives you a nice ranking of all cluster users usage for the year 2016

Compilers

There is installed

  • intel compilers
  • gcc
  • PGI Compilers

via qlustar packages in the default path so you can use them without further ado from the command line.

MPI

The mpi wrappers are installed with qlustar packages as well. We use the OpenMPI implementation. The compilers are invoked by appending .openmpi-$CC 2) to the desired mpi* compiler executable.

$CC \ language C FORTRAN C++3) </sup>
intelmpicc.openmpi-icc mpif77.openmpi-icc, mpif90.openmpi-icc mpiCC.openmpi-icc, mpicxx.openmpi-icc
gccmpicc.openmpi-gcc mpif77.openmpi-gcc, mpif90.openmpi-gcc mpiCC.openmpi-gcc, mpicxx.openmpi-gcc
pgimpicc.openmpi-pgi mpif77.openmpi-pgi, mpif90.openmpi-pgi mpiCC.openmpi-pgi, mpicxx.openmpi-pgi

You most certainly will need to adjust your Makefile(s) or ./configure your software properly in order for the compilers to be found.

If needed, you can find the openMPI root directory for your compiler collection $CC and version $VER (being 1.8.8 or 2.0.2, currently) at /usr/lib/openmpi/$VER/$CC. See also the -show switch of the mpi compiler wrappers.

To run your parallel executable locally, you need to use mpirun.openmpi. In your slurm batch scripts always use srun instead of mpirun!

Note: due to problems in qlustars wrapper scripts for the mpirun executable, if you want to run mpi programs locally, you need to use:

mpirun.openmpi[-$VER] --prefix /usr/lib/openmpi/$VER/$CC ...regular mpi arguments...

the /apps folder


1)
must be member of the group softadm to change things here
2)
$CC compiler collection: gcc icc or pgi
3)
mpiCC and mpicxx both compile C++