🚨 New storage 🚨

Starting 2024-12-06, we are officially starting the long-awaited migration to our new storage system!

We have sent the main points in an email, but here is the full documentation. Note that you will probably need to perform some actions to transfer your data.

Impacted GenOuest services

The following services will be impacted by this migration:

The following services will not be impacted (except an interruption during the scheduled maintenance, early january):

Impact on your data storage

Three volumes accessible from our Slurm cluster will be impacted: /home, /groups and /scratch.

/home

GenOuest /home will be transferred to the new storage automatically, you will not need to do anything on this volume.

/groups

For /groups/, you will need to copy your own data to the new storage, by yourself.

For each directory currently in /groups, we have prepared a matching directory on the new storage, in /projects, with a similar quota.

See here for example of command lines to copy your data.

/scratch

For /scratch/, you will also need to copy your own data to the new storage, by yourself.

For each directory currently in /scratch, we have a corresponding directory prepared on the new storage, in /scratch-new, with a similar quota.

See here for example of command lines to copy your data.

NB: If you had one of the few Genouest project spaces (not personal scratch spaces = folders with your username) stored in /scratch, it has been created under /projects with the same name.

CeSGO / Data-access

If you use Data-access to access (and maybe share) files stored in /home, /groups or /scratch, there are 2 situations:

Timeline and next steps

The expected timeline of the migration is as follows:

We expect the whole data transfer period to last roughly three months. As the timing is fairly tight (due to the age of the old storage), please do not delay in starting your data transfers.

Copying data

While it is possible to use a simple cp command, we recommend to use the rsync command: it will make a full copy from one directory (the "source") to another one (the "destination"), preserving as many file attributes as possible (permissions, creation/access time, etc).

If it gets interrupted for any reason, it is also capable of resuming the transfer from where it was right before the interruption.

The general syntax is: rsync [some options] /source/ /destination

Rsync will detect and synchronise any difference between the source and the destination directory, e.g. if you have finished a first copy, then modified/added/deleted a file in /groups/your_project/, just rerun the rsync command to update the copy in /projects/your_project/.

Here are a few examples of rsync commands (see comments below):

# Copy data from old (/groups) to new storage (/projects)
rsync -av --stats --progress /groups/your_project/ /projects/your_project/

# Copy data from old scratch (/scratch-old) to new storage (/scratch)
rsync -av --stats --progress /scratch-old/your_scratch_space/ /scratch/your_scratch_space/

# Copy to a different path
rsync -av --stats --progress /groups/your_project/some/subfolder/ /projects/your_project/another/sub/folder/

Do not, in any circumstance, launch these commands directly on the genossh node

We ask you to run these command on compute nodes, in Slurm jobs, using sbatch. See the Slurm section for details. You are advised to use the --constraint avx2 on your sbatch command to speedup transfers.

Here is a minimal bash script (sync.sh) to use on the cluster:

#!/bin/bash
#SBATCH --constraint avx2

rsync -av --stats --progress /groups/your_project/ /projects/your_project/

You can then submit this job with the command sbatch sync.sh

You may have an errror related to a failure to set time on /projects/XXX. You can safely ignore this error.

The trailing slashes of each path are important when copying whole directories, don't forget them unless you know what you're doing (in particular if you run your command multiple times).

The --delete option should be used wisely: if used, any file absent from the "source" will be deleted in the "destination".

See below for a note on file/directory ownership when using rsync (or even cp).

We expect to have many data transfers, from all users, running at the same time during the migration phase. Please avoid running many rsync commands at once to limit the risk of network congestion.

If you have big volumes of data (>15 TB) to transfer at once, feel free to contact us, we might be able to use faster methods under certain conditions.

Permissions

When transfering your own data from /groups or /scratch, you might encounter some UNIX permission problems, in particular when multiple users are working on the same files/directories.

rsync/cp and file ownership

Please note that copying files on your own will not retain its original OWNER, it will always be set to the user who launches the rsync/cp command.

For example:

This is an UNIX limitation, as normal users are not allowed to change file ownership.

If it is a problem for your data, and the original owner (birgit) cannot make the transfer, please contact us, and we will adapt the owner to the right one.

Make sure to set the correct permissions on files (770) and folders (660) to avoid temporary access issues.

Expired/deleted users

If you need to change the owner of files/directories owned by someone who no longer has a GenOuest account, please contact us. Expired owners will still have their login displayed in the ls command output, while deleted account will only be listed by the numeric user id (uid).

Project membership

This migration is a very good opportunity to review the list of users having access to shared project. If you are the owner of a project, connect to My.genouest.org where you will be able to add/remove any member to/from your projects.

Quota

To consult your quota on the new storage, you can use the same command as on the old /groups:

df -h /projects/<your-project>
df -h /scratch/<your-scratch-space>
df -h /path/to/your/home

If you need to change your quota (to increase or decrease it), please contact us. It is always appreciated to free some space for other users when you no longer use it :).

FAQ

I am not sure if I need to transfer my Data-access data

Only the data that is "mounted" from the cluster needs to be transfered. Usually, it corresponds to the content of Genouest group xxxx or Genouest home directories that can be listed in your front page on Data-access.

If you have this kind of "mounted" data, either you should have received a specific email from us, or you will probably have to do it yourself. In case of any doubt, please contact us to be sure.

Why is my directory not present in /projects or /scratch?

This can happen if your project is marked as "expired" on My. Please contact us (see below) and we will fix it for you.

Should I move my data to /projects or /scratch?

The good practices established on the old storage are still valid on the new one:

We remind you that we do not offer backup for any of our storage spaces.

Why do I need to move data on my own?

We think you know much better than us your data and how it is organised. From our experience on previous migrations, using this method is a very good opportunity for users to tidy up their storage space, check ownership of files and make some changes to file organisation.

It will take a bit more of your time, but the result will be a more tidied up resource, and a lower total disk usage (and as such, a positive ecological impact).

Help

If you have any questions or remarks regarding this procedure (quotas, permissions, etc...), please do not hesitate to contact us at support@genouest.org.