Running Stata
STATA version 17.0 is available on Hopper. The licence allows for 15 users and 12 cores per run. There are two ways you can run STATA on Hopper - from the Open OnDemand Web Server or from a shell session on the Command Line / with a SLURM script.
Using STATA from Open OnDemand
To run STATA from Open OnDemand (OOD), log onto the OOD dashboard from a web browser. Select the STATA app, and after configuring the time, you should get the STATA GUI.
Command Line Interactive Session
1- After loggin into Hopper from the shell, to use STATA, navigate to where your files are stored in your /home or /scratch or /projects
and start an interactive session with salloc
:
salloc --nodes=1 --ntasks-per-node=12 --mem=50GB --time=0-1:00:00
2- Load the stata module:
module load stata
stata
[user@hopper1 ~]$ ml load stata
[user@hopper1 ~]$ stata
___ ____ ____ ____ ____ ®
/__ / ____/ / ____/ 17.0
___/ / /___/ / /___/ BE—Basic Edition
Statistics and Data Science Copyright 1985-2021 StataCorp LLC
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC https://www.stata.com
979-696-4600 stata@stata.com
Stata license: 15-user network, expiring 14 Sep 2022
Licensed to: George Mason University
Fairfax VA
Notes:
1. Unicode is supported; see help unicode_advice.
._
Batch mode
To run STATA in batch mode, you need to create do-files which contain the series of commands you would otherwise run. With a do file (filename.do) in hand, you can run it from the shell in the command line with:
stata -b do filename
filename.log
.
Once your jobs are done, you can exit from the node back to the headnode by typing:
exit
For longer more intensive jobs, you can use SLURM scripts to run your STATA jobs. The interactive sessions (from the command line) as discussed above are useful
for debugging purposes. Once your do-files are ready for longer runs, you can create slurm scripts. An example is shown below for a run on a single processor, run_Stata.slurm
:
#!/bin/bash
#SBATCH --partition=normal # submit to the normal(default) partition
#SBATCH --job-name=stata-test # name the job
#SBATCH --output=stata-test-%j.out # write stdout/stderr to named file
#SBATCH --error=stata-test-%j.err
#SBATCH --time=0-02:00:00 # Run for max of 02 hrs, 00 mins, 00 secs
#SBATCH --nodes=1 # Request N nodes
#SBATCH --ntasks-per-node=1 # Request n cores per node
#SBATCH --mem-per-cpu=5GB # Request nGB RAM per core
#load modules with
module load stata
#run stata
stata -b filename.do
To run across multiple processors (upto a maximum of 12), modify the number of tasks set in the script above and use the stata-mp
executable:
#!/bin/bash
#SBATCH --partition=normal # submit to the normal(default) partition
#SBATCH --job-name=stata-parallel # name the job
#SBATCH --output=stata-parallel-%j.out # write stdout/stderr to named file
#SBATCH --error=stata-parallel-%j.err
#SBATCH --time=0-02:00:00 # Run for max of 02 hrs, 00 mins, 00 secs
#SBATCH --nodes=1 # Request N nodes
#SBATCH --ntasks-per-node=12 # Request n cores per node
#SBATCH --mem-per-cpu=5GB # Request nGB RAM per core
#load modules with
module load stata
#run stata
stata-mp -b filename.do
This will run on the 'normal' partition which allows for upto 5 days run-time on available cpus.
You submit your SLURM script with
sbatch run_stata.slurm
Once submitted, you may log off from your terminal, and log back in later to retrieve the output.
Memory Considerations for STATA Jobs
You need to allocate enough memory to Stata. This is done by the command 'set memory #', where # is the amount of memory you want to allocate, for example:
set memory 20000
or
set memory 20m
Both of these commands will allocate 20 megabytes of memory.
There are two important considerations when deciding how much memory to allocate.
-
Make sure that you allocate an amount of memory that is larger than the file that you are using. A good rule of thumb for large files is to allocate roughly 50% more memory than the size of your file. For example, if your file is 100GB, set the memory to 150GB.
-
Make sure to match the set memory when setting the Memory parameter either in your SLURM script with the 'mem' parameter or in the Open Ondemand form when starting the STATA app. For larger data files, it is recommended to run on the 'bigmem' partition.