SSH keys for Data Transfer

This page describes how to set up dedicated data transfer ssh-keys for use in batch jobs.

Overview

SSH keys allow password-less logins via ssh using public-key cryptography. Each ssh-key comes in two parts the public-key and the private-key. The public-key is installed in the .ssh/authorized_keys file of the account that you want to access. This tells the sshd daemon to allow access to that account to anyone who possesses a copy of the private-key.

The private-key is usually kept securely on the users personal machine (on linux this lives in the file .ssh/id_rsa). To protect this file private keys are usually themselves encrypted using a password so the user has to "unlock" the key with the password before use. You must take care to protect any ssh-key that you allow to access your ARCHER account, in particular these MUST always be stored encrypted

Transfering data in batch jobs

In some cases you may want to transfer data off archer using ssh connections from within a batch job. In this case you will need to create a key without a password because you will not be present to provide the password when the batch job runs. Because the key will be stored un-encrypted you must take extra care of it to ensure it does not become a security risk. Good practice includes:

  • Make a special "data-transfer" key for transfering data from the RDF. Do not use it for anything else. Only store the private key on the RDF.
  • Make sure that you are the only user with permission to read the private-key file.
  • Only add the public-key to hosts where you need to transfer data. Remove the key when you are not planning on doing any data transfers.
  • Install the key so that it only works from the RDF and can only be used for scp to a specified directory. This means that even if you do lose the public-key it can't be used to log into the remote host or to access any other location than the directory you set up for data transfers.
  • Remove the authorized_keys file entry once the transfer has finished.

Making the key

Create the key using the ssh-keygen command. As you don't want a password enter an empty password when prompted.

-bash-4.1$ cd .ssh
-bash-4.1$ ssh-keygen -t rsa -C data-key -f data_key
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in data_key.
Your public key has been saved in data_key.pub.

Read the contents of the data_key.pub file as you will need this on the remote machine.

Installing the public-key

On the remote machine where you want to send data edit the .ssh/authorized_keys and add the contents of data_key.pub. At the beginning of this line you can put additional options to restrict what this key can be used for. For example:

  • from="*.rdf.ac.uk" Means the key can only be used from the RDF systems.
  • command="scp -t import" Means the key can only be used for running scp to a destination directory of import. Note that the destination you supply on the scp command line will be ignored if you do this.

The authorized_keys entry should then look something like:

from="*.rdf.ac.uk",command="scp -t import" ssh-rsa AAAAB3NzaC1yc2EAAAA ...deleted... 7PHM7U/Ir9 data-key

make sure you pre-create the directory you plan to import data to.

Transfering the data

Within the batch job you can then transfer data using scp without a password provided you specify the data-key file using the -i flag. e.g.

-bash-4.1$ scp  -i ~/.ssh/data_key -r data-dir remote-host:import
data1.dat                                     100%  495     0.5KB/s   00:00    
data2.dat                                     100%  495     0.5KB/s   00:00    
data3.dat                                     100%  495     0.5KB/s   00:00