🚨 New storage 🚨
Starting 2024-12-06, we are officially starting the long-awaited migration to our new storage system!
We have sent the main points in an email, but here is the full documentation. Note that you will probably need to perform some actions to transfer your data.
Impacted GenOuest services
The following services will be impacted by this migration:
- Slurm cluster (accessed through genossh.genouest.org)
- CeSGO - Data-access
The following services will not be impacted (except an interruption during the scheduled maintenance, early january):
- Genocloud: virtual machines running on https://genostack.genouest.org/
- Websites: genouest.org, cesgo.org, and all the hosted websites (*.genouest.org)
- UseGalaxy.fr server
- CeSGO services: Collaboration, Research-sharing, Projects, Instant
Impact on your data storage
Three volumes accessible from our Slurm cluster will be impacted: /home, /groups and /scratch.
/home
GenOuest /home will be transferred to the new storage automatically, you will not need to do anything on this volume.
/groups
For /groups/, you will need to copy your own data to the new storage, by yourself.
For each directory currently in /groups, we have prepared a matching directory on the new storage, in /projects, with a similar quota.
See here for example of command lines to copy your data.
/scratch
For /scratch/, you will also need to copy your own data to the new storage, by yourself.
For each directory currently in /scratch, we have a corresponding directory prepared on the new storage, in /scratch-new, with a similar quota.
See here for example of command lines to copy your data.
NB: If you had one of the few Genouest project spaces (not personal scratch spaces = folders with your username) stored in /scratch, it has been created under /projects with the same name.
CeSGO / Data-access
If you use Data-access to access (and maybe share) files stored in /home, /groups or /scratch, there are 2 situations:
- If you never use the command line to access our cluster: we will take care of migrating your data, no action needed from you. We will send you a specific email for confirmation, ask us if you don't receive it or in case of doubt.
- If you usually access your data through data-access and the command line: you are expected to copy your data to the new storage, and data-access will use it starting from the scheduled downtime (see timeline below).
Timeline and next steps
The expected timeline of the migration is as follows:
- 2024-12-06: Start of the migration [passed ✓]
- /projects (the new name for /groups) and /scratch-new are available on genossh and all compute nodes.
- You can start copying your data from that point on.
- 2025-01-06 to 2025-01-11: Scheduled downtime [passed ✓]
- The GenOuest platform will be unavailable, as we reboot and change the mount points on all of our infrastructure.
- All running jobs will be stopped during this period to avoid data corruption.
- The exact timing of this maintenance will be adjusted later (we expect it to be shorter).
- 2025-01-11: New storage in production [current situation]
- The old storage will still be available, but in read-only mode, in /groups and /scratch-old.
- You will still be able to copy your data to the new storage, but will not be able to write on these old spaces anymore.
- CeSGO / Data-access will only allow access to data moved to the new storage.
- 2025-01-31: End of maintenance of old storage
- The old storage will be out of maintenance, and thus the risk of data loss will increase.
- Make sure the most important data is transferred before this date.
- ~ 2025-03-31: Shutdown of old storage
- The old storage will be unmounted and unavailable.
We expect the whole data transfer period to last roughly three months. As the timing is fairly tight (due to the age of the old storage), please do not delay in starting your data transfers.
Copying data
While it is possible to use a simple cp
command, we recommend to use the rsync
command: it will make a full copy from one directory (the "source") to another one (the "destination"), preserving as many file attributes as possible (permissions, creation/access time, etc).
If it gets interrupted for any reason, it is also capable of resuming the transfer from where it was right before the interruption.
The general syntax is: rsync [some options] /source/ /destination
Rsync will detect and synchronise any difference between the source and the destination directory, e.g. if you have finished a first copy, then modified/added/deleted a file in /groups/your_project/
, just rerun the rsync
command to update the copy in /projects/your_project/
.
Here are a few examples of rsync commands (see comments below):
# Copy data from old (/groups) to new storage (/projects)
rsync -av --stats --progress /groups/your_project/ /projects/your_project/
# Copy data from old scratch (/scratch-old) to new storage (/scratch)
rsync -av --stats --progress /scratch-old/your_scratch_space/ /scratch/your_scratch_space/
# Copy to a different path
rsync -av --stats --progress /groups/your_project/some/subfolder/ /projects/your_project/another/sub/folder/
Do not, in any circumstance, launch these commands directly on the genossh node
We ask you to run these command on compute nodes, in Slurm jobs, using sbatch. See the Slurm section for details. You are advised to use the --constraint avx2
on your sbatch command to speedup transfers.
Here is a minimal bash script (sync.sh) to use on the cluster:
#!/bin/bash
#SBATCH --constraint avx2
rsync -av --stats --progress /groups/your_project/ /projects/your_project/
You can then submit this job with the command sbatch sync.sh
You may have an errror related to a failure to set time on /projects/XXX. You can safely ignore this error.
The trailing slashes of each path are important when copying whole directories, don't forget them unless you know what you're doing (in particular if you run your command multiple times).
The --delete
option should be used wisely: if used, any file absent from the "source" will be deleted in the "destination".
See below for a note on file/directory ownership when using rsync
(or even cp
).
We expect to have many data transfers, from all users, running at the same time during the migration phase. Please avoid running many rsync commands at once to limit the risk of network congestion.
If you have big volumes of data (>15 TB) to transfer at once, feel free to contact us, we might be able to use faster methods under certain conditions.
Permissions
When transfering your own data from /groups or /scratch, you might encounter some UNIX permission problems, in particular when multiple users are working on the same files/directories.
rsync
/cp
and file ownership
Please note that copying files on your own will not retain its original OWNER, it will always be set to the user who launches the rsync
/cp
command.
For example:
- you login to the cluster as roberto, who belongs to the prj_awesome unix group
- you want to transfer
/groups/awesome/test.tsv
, which is owned by birgit:prj_awesome - you run an
rsync
command:rsync -av --stats --progress --delete /groups/awesome/test.tsv /projects/awesome/test.tsv
/projects/awesome/test.tsv
will be owned by roberto:prj_awesome, instead of birgit:prj_awesome
This is an UNIX limitation, as normal users are not allowed to change file ownership.
If it is a problem for your data, and the original owner (birgit) cannot make the transfer, please contact us, and we will adapt the owner to the right one.
Make sure to set the correct permissions on files (770) and folders (660) to avoid temporary access issues.
Expired/deleted users
If you need to change the owner of files/directories owned by someone who no longer has a GenOuest account, please contact us. Expired owners will still have their login displayed in the ls
command output, while deleted account will only be listed by the numeric user id (uid).
Project membership
This migration is a very good opportunity to review the list of users having access to shared project. If you are the owner of a project, connect to My.genouest.org where you will be able to add/remove any member to/from your projects.
Quota
To consult your quota on the new storage, you can use the same command as on the old /groups:
df -h /projects/<your-project>
df -h /scratch/<your-scratch-space>
df -h /path/to/your/home
If you need to change your quota (to increase or decrease it), please contact us. It is always appreciated to free some space for other users when you no longer use it :).
FAQ
I am not sure if I need to transfer my Data-access data
Only the data that is "mounted" from the cluster needs to be transfered. Usually, it corresponds to the content of Genouest group xxxx
or Genouest home
directories that can be listed in your front page on Data-access.
If you have this kind of "mounted" data, either you should have received a specific email from us, or you will probably have to do it yourself. In case of any doubt, please contact us to be sure.
Why is my directory not present in /projects or /scratch?
This can happen if your project is marked as "expired" on My. Please contact us (see below) and we will fix it for you.
Should I move my data to /projects or /scratch?
The good practices established on the old storage are still valid on the new one:
- /scratch is meant to be a higher performance volume, but without snapshots.
- /projects is meant to contain "colder" data (raw or finalized data), and gives access to snapshots.
We remind you that we do not offer backup for any of our storage spaces.
Why do I need to move data on my own?
We think you know much better than us your data and how it is organised. From our experience on previous migrations, using this method is a very good opportunity for users to tidy up their storage space, check ownership of files and make some changes to file organisation.
It will take a bit more of your time, but the result will be a more tidied up resource, and a lower total disk usage (and as such, a positive ecological impact).
Help
If you have any questions or remarks regarding this procedure (quotas, permissions, etc...), please do not hesitate to contact us at support@genouest.org.