Modified

March 13, 2025

Data transfer

To ensure efficient file transfers on HPC systems, we will go through:

  • Overview rsync and scp
  • Basic commands

When transferring files between servers, it’s important to ensure that the files are consistent. Differences in file size can occur due to variations in filesystems, but large differences might indicate an issue. Manually checking file sizes in a terminal (ls -lh or du -sh) to determine if a transfer was successful might not be ideal, as it doesn’t guarantee file integrity. Using checksums provides a more reliable verification method.

You can use md5sum to verify that the file contents are identical on both servers. Run the following command on each server:

md5sum /path/to/file

Using rsync for efficient file transfers rsync is a powerful alternative to scp for transferring files. It only sends data if the file has changed, making it more efficient.

# Transferring files between local machine-server
rsync -avz local/path/to/file user@server:/remote/path/to/file
# Transferring files between servers
rsync -avz server1:/path/to/my_folder server2:/path/to/destination_folder

The -avz flags are commonly used together for efficient file synchronization:

  • -a (archive): preserves symbolic links, permissions, timestamps, groups, owners, and devices while transferring files
  • -v (verbose): display detailed information about the transfer (which files are being copied or updated)
  • -z (compress): enables compression during transfer

Other useful flags are:

  • --progress: display a progress bar (transfer speed, percentage completed, estimated time remaining, …)
  • --partial: ensures that partially transferred files are not discarded if the transfer is interrupted, allowing rsync to resume the transfer from where it left off the next time the command is run
Note

To transfer files directly between two servers from your local workstation, ensure your SSH setup (configuration, keys, etc.) allows access to both servers. Check this section if you need help setting up your keys (generating, configuring and managing).

Advantages of rsync over scp:

rsync is an efficient protocol to compare and copy files between directories or server. It can resume interrupted transfers and compress files on the fly.

  • Checksum Verification: rsync checks the hashsums of files and only transfers data if the hashes differ. This ensures that only the changed parts of the files are sent (so you can rsync a whole folder, and only the changes files will be send).
  • Timestamp Preservation: Using the -a flag with rsync preserves the modified timestamps of files, which is particularly useful for tools like Snakemake.

If you prefer using SCP (Secure Copy Protocol) for transferring files between a local and remote host, or between two remote hosts, here are some useful commands:

# copy from local to remote
scp /home/my_laptop/files.txt username@login.server.dk:/home/username/myproject/

# If you want to copy an entire folder, use the option -r (recursive copy)
scp -r /home/my_laptop/myfolder username@login.server.dk:/home/username/myproject/

# check other options available 
scp --help 

Important rsync options:

  • -a: Archive mode, preserves file attributes like timestamps and permissions (important if you are using snakemake).
  • -v: Verbose mode, provides detailed information during transfer.
  • -z: Compresses data during transfer, reducing the amount of data sent over the network.
  • -c: Enables checksum checking, ensuring that files are identical by comparing their contents rather than just their size and modification time.

Interactive transfer

For users who prefer a graphical interface, tools like Cyberduck and FileZilla can also be used for transferring files between servers.

  1. Download Filezilla here or Cyberduck here
  2. Open the app and configure the access information for your host machine (including password, username, ssh keys (if relevant) and port: 22).
    • Select SFTP (SSH File Transfer Protocol) option on Cyberduck
  3. Quick connect

This will establish a secure connection to your host. You can navigate through your folders and files. Right-click on any file you want to download or preview.

  • Filezilla: your local files will be display on the left-side of the dashboard. Right-click on them to upload or add it them to a transfer queue. If you have created a queue, this will be shown at the bottom of the window as a list. You can inspect destination folders from there and choose other options such as transfer priority. To start a queue, use CTRL + P, Transfer --> Process Queue or use the toolbar.
  • Cyberduck: you can drag files from your local to the host, choose the directory where you want them located.

UCloud Users

Data transfer

To transfer files between UCloud and your local machine, you must first configure your SSH keys. If you haven’t done so already, follow the instructions in the SSH keys section.

Next, open an application on UCloud (e.g., Terminal) which will display the login command and the SSH port for client connections in the job progress view. Note that the SSH port is randomly generated and will be different each time.

# Run these commands on a terminal in your local machine
# Files from UCloud to local
rsync -avP -e "ssh -i ~/.ssh/id_rsa -p <port>" ucloud@ssh.cloud.sdu.dk:/work/<path_to_data> ./<path_local_data>

# Files from local to UCloud
rsync -avP -e "ssh -i ~/.ssh/id_rsa -p <port>" ./<path_local_data> ucloud@ssh.cloud.sdu.dk:/work/<path_to_data>
  • ~/.ssh/id_rsa: path to your SSH private key on the remote host (locally)
  • <port>: SSH number, which you will find on UCloud
  • ucloud@ssh.cloud.sdu.dk: default user on the remote server must be ucloud.
  • /work/<path_to_data>/: data must be synchronized to a folder within the default working directory work

To transfer files between two servers, we recommend performing the transfer from a terminal on UCloud. Below is an example of the command for GenomeDK. Please note that this command will prompt you to enter your password, and you may also be asked to complete a two-factor authentication (2FA) process, if enabled.

# From genomeDK to Ucloud
rsync -avP <username>@login.genome.au.dk:/home/<username>/data /work/<path_to_data>
  • /work/<path_to_data>/: data must be synchronized to a folder within the default working directory work
  • Reverse the order of the command if you need to transfer from UCloud to the GenomeDK server.