Skip to content

Encrypting Files

The Hopper cluster is generally intended for storing and computing on public and open research data that does not contain any sensitive, controlled or classified information. Therefore, it is important that reseachers know the nature of their data's classification and storage requirements.

The Hopper cluster is authorized for storing some restricted data1 as long as

  • it does not contain Personally Identifiable Information (PII) or
  • the data has been de-identified or
  • it is governed by FAR 52.204-21 policy for "Basic safeguarding of Federal Contract Information (FCI)"

If you are working on data with any sensitivity such as highly sensitive data (HSD) or restricted data, you should contact itrc@gmu.edu or set up a consultation with GMU's IT Security Office (ITSO) to assess risk and compliance issues associated with the data and its intended use. If the data is deemed appropriate for storage on ORC facilities, you can proceed with storing it given certain controls such as file encryption and proper access control limits are enforced.

One way to protect sensitive data is to (1) encrypt it and (2) make sure only users authorized to work on it have access. Below, we will outline one way to encrypt files using gocryptfs along with proper password management and file access control. Unlike whole disk-encryption you may have used to encrypt your local desktop/laptop's drive, gocryptfs encrypts individual files and operates in user space so that non-privileged users can work on it without needing special permissions or intervention from admins. In short, files are stored in encrypted format, but when mounted with the correct credentials, they are accessible in unencrypted form.

Warning
  • Since gocryptfs is a user-level encryption tool, users are responsible for
    • data loss due to missing/lost decryption passwords and master keys
    • data corruption due user errors
    • the data and/or passwords falling in the hand of unauthorized users
  • Please remember to store your passwords and master key as it would be impossible to recover your data without the keys
  • Backups of the data are highly encouraged due to the inherent risk associated with encryption, corruption and loss of credentials.

General workflow

Suppose you have a few directories containing some files that you would like to store securely, the general workflow includes a set of steps for one-time initialzation and another set of steps for all subsequent usage.

graph TD;
    A((File Encryption)) -.- B{Directory initialized for encryption?}; 
        B --> |No| C((Initialization))
    C -.-> D(Create, encrypt and save password)    
    D --> E(Create directory to encrypt)    
    E --> F[Initialize encrypted filesystem];
    F --> G[Save and encrypt master key];
    G --> L;
        B --> |Yes| K((Usage))
    K -.-> L(Create mount directory)
    L --> M(Mount encrypted filesystem)
    M --> N[Use encrypted filesystem];
    N --> O[Unmount encrypted filesystems when finished];

The following screencast demonstrates the whole process from initialization to usage and unmounting. asciicast

One-time initialization

To enable file encryption in a directory, it needs to be initialized first. During this initialization process, users will set an encryption password and receive a master key for use if the password is ever lost.

  • load Lmod module with gocryptfs and related utilities
    • module load gocryptfs
  • create and initialize an encrypted directory (say, private)
    • mkdir private
    • gocryptfs -init private
      • you will be asked to create an encryption password here. Please provide a sufficiently complex password and save it. See the password management section below.
      • after the password is accepted, you will be provided with a 'master-key'. Please save that master key along with the password securely. If you lose both, your data cannnot be decrypted. See the password management section below.
      • you will see that files gocryptfs.conf and gocryptfs.diriv are created in the private directory. Please do not tamper with or remove these files since that could make decrypting your data impossible.
        • gocryptfs.conf is the global configuration for the whole encrypted directory
        • gocryptfs.diriv is created per-directory for encryption of file names

Subsequent usage

After the above initialization, all future usage involve mounting the decrypted directory, working on it, and unmounting it when finished.

  • create a mount point (say, public) and mount the encrypted directory (private) to the decrypted directory (public).
    • mkdir public
    • module load gocryptfs
    • gocryptfs private public
  • go into the decrypted directory (public) and do your work
    • cd public
    • ... do your work ...
      • as you add files to the decrypted directory, the encrypted analogs should appear on the encrypted directory with random looking file names.
  • when finished, exit and unmount the decrypted directory (public)
    • cd
    • fusermount -u public

The animation below illustrates the workflow described above:

gocryptfs demo

Password management

Passwords/passphrases are needed to encrypt files and we should have mechanisms to store and retrieve them securely. If they are ever lost or compromised, it would likely be impossible to recover your data. Or, if they end up in the hands of unauthorized users, it would allow the unauthorized users full access to the sensivive data. So, you should take password management very seriously.

gocrypyfs can get passwords

  • from the user via command-line (STDIN) when prompted (default)
  • from a file
    • gocryptfs -passfile FILE ...
  • by executing an external command
    • gocryptfs -extpass COMMAND

Our current workflow recommends

  • writing the encrypted password/passphrase to a file and encrypting that file using OpenSSL at initilization time
  • decrypting the password file and feeding it to gocryptfs to mount the decrypted directory

That process is outlined in detail below.

Encrypting and using passwords with OpenSSL

Initialization

  • create a secure passphrase to encrypt your filesystem and write it to a file (pass.txt)
    • encrypt that file using openssl and save it as (pass.enc)
  • initialize the encrypted directory
    • when you are provided a master key, write the master key to a file (mk.txt)
    • encrypt the master key file using openssl and save it to a file (mk.enc)
  • delete the plain text files (pass.txt and mk.txt)
  • make sure the access to both encrypted credential files is limited to only those who are authorized to access the files

Here is an example given the following inputs:

key value
volume /groups/${VOLUME}
text password file /groups/${VOLUME}/.${VOLUME}-pass.txt
encrypted password file /groups/${VOLUME}/.${VOLUME}-pass.enc
text master key file /groups/${VOLUME}/.${VOLUME}-mk.txt
encrypted master key file /groups/${VOLUME}/.${VOLUME}-mk.enc
encrypted directory /groups/${VOLUME}/private
decrypted mount directory ${HOME}/public
Creating and encrypting the passwords

You want to create a complex password or passphrase such as the four word passphrase below. You can write the passphrase to a file and encrypt it with OpenSSL.2

module load gocryptfs

VOLUME=testvol

shuf -n4 /usr/share/dict/words | tr '\n' '-' > /groups/${VOLUME}/.${VOLUME}-pass.txt

openssl enc -aes-256-cbc -md sha512 -pbkdf2 -iter 1000000 -salt -in /groups/${VOLUME}/.${VOLUME}-pass.txt -out /groups/${VOLUME}/.${VOLUME}-pass.enc -k ''

Please store the passphrase securely.

Initializing the encrypted directory

Use the encrypted passphrase you generated above (/groups/${VOLUME}/.${VOLUME}-pass.enc) to initialize the encrypted directory.

$ $GOCRYPTFS/decrypt.sh /groups/${VOLUME}/.${VOLUME}-pass.enc | gocryptfs -init private
Choose a password for protecting your files.
Reading Password from stdin

Your master key is:

    xxxxxxxxx-b666bfbe-920797bc-3ebfdbd7-
    0700e9a0-a4d26c3e-7d2e0b28-zzzzzzzzz

If the gocryptfs.conf file becomes corrupted or you ever forget your password,
there is only one hope for recovery: The master key. Print it to a piece of
paper and store it in a drawer. This message is only printed once.
The gocryptfs filesystem has been created successfully.
You can now mount it using: gocryptfs private MOUNTPOINT

You should now see these files in the encrypted directory.

$ ls private/
gocryptfs.conf  gocryptfs.diriv

Save master key somewhere safe, and additionally locally in case it it needed for future recovery. Encrypt it with OpenSSL as follows.

$ vim /groups/${VOLUME}/.${VOLUME}-mk.txt
>> add master key

openssl enc  -aes-256-cbc -md sha512 -pbkdf2 -iter 1000000 -salt -in /groups/${VOLUME}/.${VOLUME}-mk.txt -out /groups/${VOLUME}/.${VOLUME}-mk.enc -k ''
Deleting the plain text files
$ shred -uvz /groups/${VOLUME}/.${VOLUME}-pass.txt /groups/${VOLUME}/.${VOLUME}-mk.txt
Setting proper permissions on pass files

Change the file permissions to be readable by you only.

$ chmod 400 /groups/${VOLUME}/.${VOLUME}-pass.enc /groups/${VOLUME}/.${VOLUME}-mk.enc
If it needs to be readable by any group (say orc-testgroup), change the file group ownership to that group and the permission to 440, or change the access control list (ACL) to allow read permissions to group orc-testgroup
$ setfacl -m  group:orc-testgroup:r--  /groups/${VOLUME}/.${VOLUME}-pass.enc /groups/${VOLUME}/.${VOLUME}-mk.enc
Checking proper decryption

If you wan tot check that the file encryption/decryption works properly, you can run the following command for decryption:

$ openssl enc -d -aes-256-cbc -md sha512 -pbkdf2 -iter 1000000 -salt -k '' -in /groups/${VOLUME}/.{VOLUME}-pass.enc
This should yield your encryption password in plain text format. You can do the same to decrypt the encrypted master key file and confirm its fidelity.

Subsequent usage

For subsequent uses, you can decrypy and feed the passphrase to gocryptfs directly as follows to mount the decrypted directory

$ $GOCRYPTFS/decrypt.sh /groups/${VOLUME}/.${VOLUME}-pass.enc | gocryptfs private ${HOME}/public

Disaster recovery using master key

If you ever lose your passphrase, you can use the master key to decrypt the directory.

$ $GOCRYPTFS/decrypt.sh /groups/${VOLUME}/.${VOLUME}-mk.enc | gocryptfs -masterkey=stdin private ${HOME}/public

Usage

How you access and encrypted data differs slightly based on the use case.

On head/login nodes

If users want to access the encrypted files/directories, they would mount the encrypted filesystem to a mount point and go into that mount point to do their work

The most common workflow is

  • user logs into a login/head node
    • ssh NetID@hopper.orc.gmu.edu
  • user mounts the encrypted data to an decrypted directory3
    • module load gocryptfs
    • $GOCRYPTFS/decrypt.sh /groups/${VOLUME}/.${VOLUME}-pass.enc | gocryptfs -idle 1h0s private ${HOME}/public 3
  • user goes into the decrypted directory
    • cd ${HOME}/public
  • user does their work
  • when finished, user leaves the mounted directory
    • cd
  • user unmounts the filesystem
    • fusermount -u ${HOME}/public

On Slurm compute nodes

If one wants to run Slurm jobs on data contained in encrypted files/directories, the same workflow as above applies, with certain necessary modifications.

The most common workflow is

  • user logs into a login/head node
    • ssh NetID@hopper.orc.gmu.edu
  • user mounts the encrypted data to an decrypted directory
    • module load gocryptfs
    • $GOCRYPTFS/decrypt.sh /groups/${VOLUME}/.${VOLUME}-pass.enc | gocryptfs -nosyslog private ${HOME}/public
  • user goes into the decrypted directory
    • cd public
  • user sets up their slurm jobs (say, file named run.slurm) and submits it to the queue manager
    • sbatch run.slurm

While the decrypted directory is mounted on the login/head node the user is on, it is not mounted on the compute node where the job is scheduled to run. Therefore, Slurm will fail if it tries to access files stored in the public directory before it is mounted. By default, Slurm considers the directory you are submitting the job from as your the working directory ($SLURM_SUBMIT_DIR) and it expects that directory to exist in the compute node.

  • since the decrypted directory is not avaialble until it is explictly mounted on the compute node, you need to change your Slurm working directory to another location at first by adding something like '/tmp' using SBATCH --chdir=/tmp in your Slurm batch submission file or when requesting an interactive session using salloc.
    • Slurm batch submission file
      • SBATCH --chdir=/tmp
    • Interactive session using
      • salloc --chdir=/tmp ...
    • You can change to the decrypted directory once it is mounted
  • the following options are generally recommended for gocryptfs in Slurm jobs
    • -noempty - Allow mounting over non-empty directories. FUSE by default disallows this to prevent accidental shadowing of files.
    • -nosyslog - Diagnostic messages are normally redirected to syslog once gocryptfs daemonizes. This option disables the redirection and messages will continue be printed to STDOUT and STDERR.
    • -idle duration - Automatically unmount the filesystem if it has been idle for the specified duration. Durations can be specified like “500s” or “2h45m”. 0 (the default) means stay mounted indefinitely.

Interctive jobs

For interactive jobs, the only thing to be careful about is that your submission/working directory ($SLURM_SUBMIT_DIR) is not the decrypted directory.

salloc -p interactive -q interactive --ntasks-per-node=16 --chdir=/tmp  -t 0-01:00:00

Once you are logged into the compute node, you can proceed similarly to head/login nodes.

  • user mounts the encrypted data to an decrypted directory
    • module load gocryptfs
    • $GOCRYPTFS/decrypt.sh /groups/${VOLUME}/.${VOLUME}-pass.enc | gocryptfs private ${HOME}/public
  • user goes into the decrypted directory
    • cd ${HOME}/public
  • user does their work
  • when finished, user leaves the mounted directory
    • cd
  • user unmounts the filesystem
    • fusermount -u ${HOME}/public

Batch jobs

For batch jobs, users can adapt the following job templates

As you can see in the sample Slurm batch submission file below (standard-job.slurm), the main components of the workflow are

  1. Mount ENCRYPTED folder to DECRYPTED mount point
  2. Go into the DECRYPTED folder and run your calculations
  3. Leave DECRYPTED folder and unmount it

For array jobs that need to access encrypted files, one can either

  • mount multiple instances of the encrypted filesystem or
  • mount one filesystem with '-allow_other' option for

In the example below, each array task mounts the decrypted directory at a unique location (${HOME}/public/$SLURM_JOB_NAME-$SLURMD_NODENAME-$SLURM_JOBID-$SLURM_ARRAY_TASK_ID)

#!/bin/bash
#SBATCH --partition=normal                 
#SBATCH --qos=normal
#SBATCH --job-name=jGocryptfs-file-encryption
#SBATCH --output=/scratch/%u/%x-%N-%j.out  
#SBATCH --error=/scratch/%u/%x-%N-%j.err  
#SBATCH --chdir=/tmp
#SBATCH --nodes=1
#SBATCH --ntasks=32           
#SBATCH --cpus-per-task=1        
#SBATCH --mem-per-cpu=3500M          
#SBATCH --export=ALL
#SBATCH --time=0-06:00:00       

module load gocryptfs
module list

###############################################################################
cat << EOF
=====================================================
(1) Mount ENCRYPTED folder to DECRYPTED mount point
=====================================================
EOF

ENCRYPTED=/groups/${VOLUME}/private
DECRYPTED=${HOME}/public/$SLURM_JOB_NAME-$SLURMD_NODENAME-$SLURM_JOBID
mkdir -p $DECRYPTED

echo "mounting encrypted folder ($ENCRYPTED) to decrypyted folder ($DECRYPTED)"
$GOCRYPTFS/decrypt.sh /groups/${VOLUME}/.${VOLUME}-pass.enc | gocryptfs -nosyslog $ENCRYPTED $DECRYPTED

mount -t fuse.gocryptfs

###############################################################################
cat << EOF
===========================================================
(2) Go into the DECRYPTED folder and run job (FIO in this case)
===========================================================
EOF

cd $DECRYPTED
echo "currently working in $PWD"
<RUN FIO JOB>

cp /scratch/$USER/$SLURM_JOB_NAME-$SLURMD_NODENAME-$SLURM_JOBID.out .
cp /scratch/$USER/$SLURM_JOB_NAME-$SLURMD_NODENAME-$SLURM_JOBID.err .

###############################################################################
cat << EOF
===========================================================
(3) Leave DECRYPTED folder and unmount it
===========================================================
EOF

cd ~/
fusermount -u $DECRYPTED


###############################################################################
cat << EOF
===========================================================
Confirm decrypted filesystyem is ummounted
===========================================================
EOF
mount -v -t fuse.gocryptfs
#!/bin/bash
#SBATCH --partition=normal
#SBATCH --qos=normal
#SBATCH --job-name=jGocryptfs-file-encryption
#SBATCH --output=/scratch/%u/%x-%N-%j-%a.out 
#SBATCH --error=/scratch/%u/%x-%N-%j-%a.err 
#SBATCH --chdir=/tmp
#SBATCH --nodes=1
#SBATCH --ntasks=32                    
#SBATCH --cpus-per-task=1               
#SBATCH --mem-per-cpu=3500M     
#SBATCH --array=16,32
#SBATCH --export=ALL
#SBATCH --time=0-06:00:00

module load gocryptfs
module list

###############################################################################
cat << EOF
=====================================================
(1) Mount ENCRYPTED folder to DECRYPTED mount point
=====================================================
EOF

ENCRYPTED=/groups/${VOLUME}/private
DECRYPTED=${HOME}/public/$SLURM_JOB_NAME-$SLURMD_NODENAME-$SLURM_JOBID-$SLURM_ARRAY_TASK_ID
mkdir -p $DECRYPTED

echo "mounting encrypted folder ($ENCRYPTED) to decrypyted ($DECRYPTED) folder"
$GOCRYPTFS/decrypt.sh /groups/${VOLUME}/.${VOLUME}-pass.enc | gocryptfs -nosyslog $ENCRYPTED $DECRYPTED

mount -t fuse.gocryptfs

###############################################################################
cat << EOF
===========================================================
(2) Go into DECRYPTED folder and run job (FIO in this case)
===========================================================
EOF

cd $DECRYPTED
echo "currently working in $PWD"
<RUN FIO JOB>

cp /scratch/$USER/$SLURM_JOB_NAME-$SLURMD_NODENAME-$SLURM_JOBID-$SLURM_ARRAY_TASK_ID.out .
cp /scratch/$USER/$SLURM_JOB_NAME-$SLURMD_NODENAME-$SLURM_JOBID-$SLURM_ARRAY_TASK_ID.err .

###############################################################################
cat << EOF
===========================================================
(3) Leave DECRYPTED directory and unmount it
===========================================================
EOF
cd ~/
fusermount -u $DECRYPTED

###############################################################################
cat << EOF
===========================================================
confirm DECRYPTED directory is ummounted
===========================================================
EOF
mount -t fuse.gocryptfs

Common questions

  1. Does it matter where I mount the decrypted directory?

    You can mount the decrypted directory anywhere, but we would recommend that it be in a secure location that you don't share with anyone else. For example, your home directory ($HOME=/home/$USER) would be a good choice. Each user mounting the decrypted directory in their personal space allows multiple users to work on the same data without needing coordination.

  2. I don't know if I am in the encrypted or decrypted directory. Is there an easy way to tell?

    By default, gocryptfs encrypts both file names and file contents. So, the file name in the enrypted directory should look like a random hash of letters and numbers while the file names should be recognizable names in the decrypted directory. You want to make sure to only change files in the decrypted directory.

  3. How can I check if the decrypted directory is mounted?

    You can run the following command to check for the presence and location of your mounted decrypted folder.

     mount -t fuse.gocryptfs 
    
    You should see something that looks like this. <ENCRYPTED_DIRECTORY> on <DECRYPTED_DIRECTORY> type fuse.gocryptfs (rw,nosuid,nodev,relatime,user_id=XXX,group_id=XXX,max_read=1048576) Among other hints, it should have user_id=$UID matching your $UID, which you can get by executing echo $UID on the command-line. You can go into the <DECRYPTED_DIRECTORY> to do your work. When finished, you can exit from that directory and unmount it using fusermount -u <DECRYPTED_DIRECTORY>
  4. I can’t unmount public directory because some one or some application is using it. What can I do?

    $ fusermount -u  /groups/TestVol/public
    fusermount: failed to unmount /groups/TestVol/public: Device or resource busy
    

    You need to exit out of this directory before it can be mounted. You can run lsof or ps aux and search for users and processes that are accessing the mounted directory and terminate those processes.

  5. My Slurm jobs are dying quickly without much diagnostic information. What could be the cause?

    A likely cause is that Slurm is trying to access decrypted directory before it is mounted. When submitting interactive or batch jobs, Slurm assumes the submission directory is the current working directory. However, if that directory corresponds to the decrypted folder, it won’t be available in the compute node until gocryptfs mounts the decrypted directory. So, one should change to a directory other than the decrypted directory location first and change into the decrypted directory one it is mounted. You can initially change your working directory to /tmp or a similar location for interactive (salloc —chdir=/tmp) or batch (#SBATCH —chdir=/tmp) sessions.

  6. What are the most common problems have?

    Users often forget to mount the decrypted volume and end up writing unencrypted data to the encrypted directory. That leads to a mix of encrypted and unencrypted files there. Users also often forget to unmount the decrypted directory once they are done with their work. Even though we set access contol lists (ACLs) to ensure only authorized users can access the data, you certainly want to unmount decrypted data as soon as it is not needed. One potential way to prevent these issues is to mount the decrypted directory with idle unmount option so that it will get unmounted if it hasn;t been accessed for a certain amount of time. For example, gocryptfs -idle 1h0m ... will unmount the decrypted directory if it has not been accessed for an hour.

  7. How can I access or the encrypted storage over Globus

    If users want to access or modify the storage over Globus, they can mount the decrypted volume on hopper1 login/head node and access it using the GMU-HOPPER1.ORC endpoint.

  8. How can I check the integrity of the encrypted data?

    You can use run gocryptfs -fsck private to see if there are any problems with the encrypted data. If there isn't, you will get a message saying "fsck summary: no problems found"

  9. Can I change the encryption password once it is set?

    Yes, you can set the password using gocryptfs -passwd private. Please encrypt and save the new password and master key as descibed in the the password management section above.


  1. Please see ITs's data classification and storage requirements for details 

  2. There are many other ways to generate and store passwords, but we are presenting the following solution since it will save you from being prompted for passwords, especially when running batch jobs. 

  3. The decrypted files are accessible to anyone with file access permissions. In case where those access permissions are not as strict as they should be, they can accidentally be accessible to unintended users. Therefore, you should keep the decrypted files mounted only when you are working on them. One way to ensure they are unmounted when they are not being used is to use the -idle TIME flag. For example, gocryptfs -idle 1h0m unmounts tne encrypted filesystem automatically if it hasn't been accessed for 1 hour.