Cluster: Slurm

Basic usage

It is forbidden to execute computations directly on the frontal (genossh.genouest.org). You MUST first connect to a node (using srun) or submit a job to a node (using sbatch).

Listing availables nodes

When you submit a job, it is dispatched to one of the computing nodes of the cluster.

Those nodes have different characteristics (cpu, ram). We have servers from 128G up to 755G RAM on the nodes, with 8 to 40 cores each. Launch the following command to display the list of available nodes and their characteristics and load (memory in MB):

sinfo -N -O nodelist,partition,cpusstate,memory,allocmem,freemem

Column 1: the node name
Column 2: the partition the node belongs to
Column 3: number of cpus of the node (allocated/idle/other/total)
Column 4: total amount of memory (in Mb)
Column 5: total amount of allocated memory (in Mb)
Column 6: total amount of unused (but potentially allocated) memory (in Mb)

("allocated" means "reserved by someone")

Creating a job

You can launch a shell on a computing node using:

srun --pty bash

You can submit a job with the sbatch command:

sbatch my_script.sh

You can add submission options in the header of the script using SBATCH directives:

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --chdir=workingdirectory
#SBATCH --output=res.txt
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

You can submit jobs to a specific partition:

sbatch -p genouest my_script.sh

By default, jobs are submitted to the main partition (genouest). You only need to use this option for very specific cases.

You can monitor your jobs with the squeue command (lists all the jobs by default, restrict to a specific user with the -u option):

squeue
squeue -u username

Reserving CPU and memory

By default each job will be limited to 1 CPU and 6GB memory. If you need more (or less) ressources, you need to add the following options to srun or sbatch commands (or using SBATCH directives):

sbatch --cpus-per-task=8 --mem=50G job_script.sh

In this example, we request 8 CPU and 50G memory on a node to execute the bash script job_script.sh. Many options are available to finely tune the amount of cpus and memory reserved for your job, have a look at the srun manual. If at least 1 CPU and 6GB are not available on one node, you may have to wait to be placed. You can use the same options when using srun.

These limits are strict, your job will not be allowed to use more than was requested. If you use more than the selected amount of RAM, your job will be killed.

If your job is stuck with the message “srun: job xxxx queued and waiting for resources” and nothing happens, it means that there are no more ressources available on the cluster. In this case, you can try to use the “tiny” partition where you can launch very short jobs with limited resources:

srun -p tiny --pty bash

Your job will get limited resources with this partition: at most 2Gb and 2 cpus, and a time limit of 2h. But these tiny jobs will have a higher priority. There is a limit of 2 simultaneous jobs per user on this partition. These limits are set to make sure anyone can have a slot to connect to a node for very short works at any time. Please don’t abuse it.

Execution time limit

All the jobs have a default maximum runtime of 15 days. If one of your jobs is still running after 15 days, it will automatically be stopped by the system.

If you know your job will take longer than 15 days, you can ask for more (e.g. 25 days) when launching your job:

sbatch --time 25-00:00:00 my_short_script.sh

You can also modify the time limit while a submitted job is still pending (PD state):

scontrol update JobId=<job-id> TimeLimit=25-00:00:00

You will get a "permission denied" error if you try to run this command on a running job. In this case, contact us and we will do it for you (keep in mind that we might not be available immediately, don't ask us 2 minutes before the job reaches its time limit).

There is a hard limit of 30 days for all jobs. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds"

If you know one of your job will be finished well before 15 days, you can use the --time option when using sbatch or srun:

sbatch --time 60 my_short_script.sh

This command means that the job will have a maximum lifetime of 60 minutes. If it lasts more than that, it will be killed.

Using --time has one big advantage: if the cluster is under heavy usage, your job will have a higher chance to be executed quickly (the job scheduler takes into account the expected maximum run time).

Monitoring resource usage

To get more information on resource usage, you can:

display more information for running jobs using squeue:

squeue -o "%.18i %.9P %.70j %.8u %.2t %.10M %.6D %4C %10m %15R %20p %7q %Z"

Note that if you used the --mem-per-cpu option, the MIN_MEMORY will not take that into account, you need to multiply it by the number of cpus reserved to get the value really reserved. Alternatively, you can use this command, and look at the TRES column to see what is really reserved:

squeue -O "JobID:18,partition:10,name:40,username:9,state:10,timeused:11,tres:45,NodeList:11,reason:20,priority,qos:10,workdir:"

get information on a specific job:

scontrol show job <job_id>

check the maximum memory used by a running job:

sstat -j <job_id>.batch --format="JobID,MaxRSS"

check the cpu time and maximum memory used by a finished job:

sacct -j <job_id> --format="JobID,CPUTime,MaxRSS,ReqMem"

Many other options to squeue, scontrol, sacct, and sstat are available, you can consult their manual by running them with the --help option.

You can find a quick tutorial on Slurm on this web site.

Killing a job

To kill a job, simply execute:

scancel <job_id>

Job arrays

It is possible to submit many similar jobs at once using job arrays. See the job arrays documentation for more details. Briefly, if you launch this command:

sbatch --array=1-50%5 my_script.sh

an array of 50 jobs will be created for the script my_script.sh, with a maximum of 5 jobs running simultaneously. In my_script.sh, you have access to the SLURM_ARRAY_TASK_ID environment variable which corresponds to the index of the task between 1 and 50.

Fair use

Our cluster is a shared and limited resource. Some limits are enforced, to avoid all resources being monopolized by a single user:

Jobs are automatically killed after 15 days of computing
No more than 40 jobs per user can run simultaneously
Jobs in pending queue are prioritized automatically by Slurm, depending on the asked resources, and the resources used in the past by each user

These limits can be modified without warning depending on the load of the cluster, and the available physical resources.

Please refrain from launching insane numbers of jobs that would block other users for too long.

Please be patient if your job is in queue, it will be executed sooner or later.

Long-running interactive jobs (srun)

If you want to create an interactive job (srun --pty bash), run a long-running command, but you need to disconnect before it is finished, unfortunately the job will be killed, and your command stopped.

There is a solution to avoid that: use tmux, which is a terminal multiplexer (just as screen).

First connect to genossh.genouest.org as usual, then start a tmux session:

tmux new

Then you can connect to a node using srun, and launch the commands you like. When you need to disconnect, you need to detach from your tmux session by typing Ctrl+B on your keyboard, then the letter d. You can then safely disconnect from genossh (and the internet).

Later, when you want to reconnect to your interactive job, just connect to genossh, and attach to your tmux session you created before by running:

tmux a

You will be be able to continue your work just as if you never disconnected from the cluster.

Tmux allows users to manage many multiple parallel sessions like this, look at the documentation for more advanced usage.

Bigmem node

If you need to run jobs that use a lot of memory, we have a compute node (cl1n046) with 3Tb of RAM.

To use it, you will have to submit jobs to a specific partition:

srun -p bigmem --pty bash

Please only submit jobs that really require more memory than other standard nodes.

Nvidia GPUs

Two compute nodes with Nvidia GPUs are available on the Slurm cluster. To use them, you will need to use sbatch or srun commands as for a normal Slurm job, but with 2 specific options:

srun --gpus 1 -p gpu --pty bash

The -p option allows you to select one of the nodes equipped with GPU. The --gpus option determines the number of GPUs which will be reserved for you by Slurm. Slurm automatically populates an environment variable (CUDA_VISIBLE_DEVICES) with the id of the GPU that you can use. This environment variable will be used by CUDA applications to use the reserved GPU(s).

Note that access to GPU Performance Counters is not restricted to admin, which means that when you compute data using GPUs, other users of the GPUs can potentially gain access to the data treated by your process. If this is a problem and you absolutely need data privacy, please contact us. For more background on this, have a look at the corresponding Security advisory.

As an alternative, the following command line still works (it uses the syntax from older versions of Slurm):

srun --gres=gpu:1 -p gpu --pty bash

CPU architecture

Some tools are required to run on recent processors supporting specific instruction sets like AVX or AVX2. A few old compute nodes don’t support these instructions. To make sure that your job will be run on a recent node supporting these instructions, you can add the --constraint option to srun or sbatch:

sbatch --constraint avx2 my_script.sh

X forwarding

If you want to run a software that requires access to an X11 server, you can enable X forwarding by following these steps:

First, connect to the cluster with the -XC options (X is to enable X forwarding, C is to enable compression):

ssh -XC <your-login>@genossh.genouest.org

You can then simply run the following commands to start using an X application:

ssh -X <your-login>@genossh.genouest.org
srun --x11 --pty bash

With older versions of Slurm (vefore 2022/05/11), you needed to have a specific ssh key to use X11 forwarding, stored in ~/.ssh/id_slurm ~/.ssh/id_slurm.pub. It is no longer the case, you can safely delete it.

DRMAA library

If you need to use the DRMAA library (to launch jobs from python code for example), you’ll need to define these environment variables:

export LD_LIBRARY_PATH=/data1/slurm/drmaa/lib/:$LD_LIBRARY_PATH
export DRMAA_LIBRARY_PATH=/data1/slurm/drmaa/lib/libdrmaa.so

Singularity

Singularity (recently renamed to Apptainer) is a new technology allowing to use containers in a High-Performance Computing environment.

Just as Docker, it allows you to launch applications inside containers, completely isolated from the rest of the system. However, unlike Docker, you don’t have access to the root account inside the container. This makes it possible to use it on a standard cluster like the GenOuest one.

Since 2021-12-01, Singularity is being renamed as "Apptainer". The commands below are still compatible.

Singularity is readily installed on all the computing nodes of the cluster. To use it, you no longer need to source any environment file.

Singularity is compatible with Docker images, you can run one like this:

singularity shell docker://quay.io/biocontainers/bowtie2:2.3.4.1--py35h2d50403_1

If you want to have access to some specific directories from the cluster, you can use the -B option like this:

singularity shell -B /db:/db -B /scratch:/scratch docker://quay.io/biocontainers/bowtie2:2.3.4.1--py35h2d50403_1

See the official website for more information on how to use Singularity.

Singularity image catalog

A huge (>12Tb) catalog of ready to use Singularity images is available on the cluster, in the /cvmfs/singularity.galaxyproject.org/ directory. Every existing Bioconda package has a corresponding image in this directory. As the number of images is huge, to find the image you want, you need to use the first and second letter of the image name in the image path. For example if you want to use a bowtie2 image, you can run it like this:

singularity shell /cvmfs/singularity.galaxyproject.org/b/o/bowtie2\:2.4.1--py38he513fc3_0

Jupyter

You can use Jupyter in multiple ways using the GenOuest resources:

By launching a VM in the Genostack cloud
By running it on the Slurm cluster

Here’s some help to run it on our cluster (inspired by https://alexanderlabwhoi.github.io/post/2019-03-08_jpn-slurm/)

First, connect to the cluster and connect to a compute node:

ssh <login>@genossh.genouest.org
srun --pty bash

Then source the preinstalled Jupyter:

. /local/env/envjupyter-6.5.4.sh

Then run a jupyter notebook, with the option --no-browser as no web browser is installed on our cluster. In the following commands we use the port 8888, but you should use another port of your choice between 10000 and 20000 for example.

jupyter notebook --no-browser --port 8888

Then, open another console on your local machine (laptop), and create an ssh bridge like this:

ssh -A -t -t <login>@genossh.genouest.org -L 8888:localhost:8888 ssh cl1nXXX -L 8888:localhost:8888

Replace cl1nXXX by the name of the node where the Jupyter notebook is running.

Then you can use your favorite web browser and connect to http://localhost:8888/

The port you chose can already be used by someone else, in this case, you’ll get an “Address already in use” error). In this case, choose another port and rerun everything with the changed port number.

If you want to use JupyterLab instead, do the same, but source jupyterlab instead:

. /local/env/envjupyterlab-4.0.2.sh

Workflows

Please refer to the Workflows page for more information on the workflow management tools available on GenOuest.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search