Deepthought2

From GeoWiki
Revision as of 21:35, 1 July 2018 by Moulik (talk | contribs) (→‎Sample Python environment)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


This is an overview of key areas to get you started with things common to those in our department. For a more general discussion, please check out the DIT Deepthought Usage .

Overview

The Deepthought2 supercomputer consists of 2 login nodes and >400 compute nodes. After logging into login.deepthought2.umd.edu you'll be assigned to either login-1 or login-2. The login nodes are to be used to compile code, move data, launch jobs, etc, and they have all the same software (and more) than the compute nodes. Please be sure not to use these nodes to directly run a multi-core job. Jobs run on 1 or more of the compute nodes, accessible only from the login nodes. These 400 compute nodes consist of:

  • 20 cores
  • 128 GB of memory
  • 750 GB temporary hard drive space
  • FDR Infiniband network connection between nodes

There are additional nodes that have increased memory (1TB) or GPUs if needed.

Disk Space

There are several areas of disk space available on Deepthought for the users:

  • Home Directory - Everyone is given 10GB of space in their home directory. Access to the home directory is slow from the compute nodes, so try not to store data/programs needed while running in here. Also, you'll get warnings if launching an MPI job from here.
  • Lustre - 1 Petabyte of storage is available under /lustre/<username>. Although there is currently no user quotas for this file system, there will inevitably be one imposed eventually. Code to be run should be stored here, though this system is not backed up. Lustre can slow down considerably depending on how people are using DT2 at that time, so to make your experiments as fast as possible it is best to make sure the bulk of the data you need is copied locally to the node's /tmp folder at run time.
  • Scratch Disk - Each compute node has 750GB of space available under /tmp/. Faster than the Lustre directory, though not able to be shared between nodes. This drive is erased after a job's completion.
  • Ram Disk - Each compute node can use part of its 128GB of memory as a ram disk located under /dev/shm/ This is temporary disk space similar to /tmp, however it resides entirely in memory, and so will be extremely fast.

Allocations

Every job submitted to the cluster is billed to an account. You can find out the accounts you have access to by using sbalance on the command line.

Accounts

When submitting a job using sbatch -A <account> <script> or using #SBATCH -A <account> within a script, there are two main categories of pools to use:

  • High Priority - These are our main pool of hours and are replenished at the beginning of each month with 610,000 CPU hours. If there are other jobs waiting to run on Deepthought, your job will be placed in the queue with a high priority. These queues must absolutely be used first, as they disappear at the end of the month. Moreover, usage of standard queues below means you are eating into next month’s High Priority allocations. Select between ved-prj-hi, ved-lab-hi, schmerr-lab-hi or schmerr-prj-hi.
  • Standard - These accounts queue jobs at a lower priority. They are the hours left over from the previous month, plus hours from the next month that you can borrow from. So, try to limit usage once this gets below 610kSU. Select between ved-prj, ved-lab, schmerr-lab or schmerr-prj

Partitions

In addition to specifying an account to charge when running a job, the partition used can be specified with sbatch -A <account> -p <partition> <script>. Normally you should only do this in 2 circumstances:

  • scavenger - by running with sbatch -A <account> -p scavenger your job will run only if there are no other normal jobs in the queue. Although no hours will be charged to the account, your job may be interrupted if a normal job enters the queue and there are no other nodes available. Your job will be put back in the queue and will wait to run again. Your job script must therefore be able to be easily stopped and restarted. The benefit of scavenger is of course that no hours will be charged to the department. This might be quite useful for the data assimilatoin people doing their usual DA cycles that can easily be restated.
  • debug - by running with sbatch -A <account> -p debug your job will be placed in the queue with a high priority (regardless of the account specified) though will only run for a maximum of 15 minutes.

Viewing Remaining Hours

Running sbalance --all will show how many hours are remaining for our department in the given month (e.g. ved-lab-hi) as well as the hours leftover from the previous month plus hours that can stolen from the next month (e.g. ved-lab). Additionally, you can see the individual usage for the members in our department.

Interactive Debugging

Interactive sessions allow you to connect to a compute node and work on that node directly. This allows you to develop how your jobs might run (i.e. test that commands run as expected before putting them in a script) and do heavy development tasks that cannot be done on the login nodes (i.e. use many cores). Interactive sessions can be started with either sinteractive or salloc commands on Deepthought2.

Using sinteractive (preferred)

For debugging purposes, instead of running directly on login mode, it is recommended to request a node first with sinteractive command.

[moulik@login-1 ~/Slurm]> sinteractive -h
Usage: sinteractive [-c NUMCPUS] [-J JOBNAME] [-a ACCOUNT] [ -t TIME ] \
	[ -d | -S ] [ -s SHELL ] [ -x ] \
	[ -f FEATURE_LIST ] [ -g GRES_LIST]

Optional arguments:
    -a: Account to charge.  Defaults to your default account (ved-lab-hi)
    -c: number of CPU cores to request (default: 1)
    -d: use the debug partition.  -t is ignored and Wall time is set to 15 minutes
    -J: job name (default: interactive)
    -s: shell to use.  Defaults to your default shell (/bin/tcsh)
    -S: use the scavenger partition.  Not advised.
    -t: Wall time limit in minutes (default: 60 minutes).
    -x: Reserve the nodes in exclusive mode, Exclusive mode means no other
	jobs are allowed on the node you reserve, which means it might take
	longer to allocate and your account will be charged more.  Default is
	shared mode.
    -f: Only reserve nodes matching FEATURE_LIST constraints.  See salloc man page
        for full description.
    -g: Reserve the generic consumable resources specified by GRES_LIST.

Maximum allowed walltime for sinteractive is 480 minutes.
Maximum number of CPUs for sinteractive is 20 cpus.

To request 1 core for 30 min: sinteractive -c 1 -t 30 and to request access to a GPU enabled node: sinteractive -c 1 -t 30 -g gpu:1

To make sure you have accessed a node with NVIDIA GPUs, you can type to following command to list the GPU configuration:

[moulik@compute-b17-4 ~/Slurm]> lspci | grep -i nvidia
03:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20m] (rev a1)
83:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20m] (rev a1)

Using salloc

Another way to request a node is with the salloc command. Please note salloc will launch the job with the following defaults: you default project account; ntask=1 (1 cpu core); memory=1G; time-24hours and no other resources such as GPU card. To specify resources:

login-1:~ salloc --account=ved-lab-hi --partition=debug --time=15
salloc: Granted job allocation 5227967
salloc: Waiting for resource configuration
salloc: Nodes compute-b28-49 are ready for job

The above command successfully requested a computing node for 15 minutes (-p debug gives a higher priority in the queue, but limit time to 15 minutes; drop this option if a longer time is needed). To login to the node: ssh -Y compute-b28-47.deepthought2.umd.edu

Closing sessions

Interactive Jobs will remain active until exit or the job is canceled. It is your responsibility to cancel any interactive session that is not being used. After you are done with debugging and enter exit to close your interactive session, make sure you are no longer using any resources by first checking if any interactive job is still running:

[moulik@login-1 ~/Slurm]> squeue -u moulik
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          10347469 high-prio interact   moulik PD       0:00      1 (Priority)
          10347471 high-prio test_spa   moulik PD       0:00      1 (Priority)
          10347475     debug     tcsh   moulik  R       5:32      1 compute-b28-49

To kill the interactive session, note the JOBID from the output above and cancel it: scancel 10347475

Running Jobs

The usual way of running a job is to create a script file that is submitted to the scheduling system with the sbatch command. Extensive details on this can be found at DIT's info on running jobs. In summary, your script will consist of at least the following lines at the top:

#!/bin/bash
#SBATCH -N 2
#SBATCH -t 2:00:00

indicating the number of nodes you want, and the maximum running time. The script is then placed in the queue with sbatch -A <account> script_name

Keep in mind that each of the nodes has 20 cores, and using any core on a node will result in being charged for usage of the entire node, so optimize your configuration accordingly (i.e. it would be a waste to request 22 cores since you would be charged for 40 cores) So, for example, using 10 nodes for a whole day would charge 4,800 hours to the department's account. If you are requesting more than one core but less than the all the cores on the node on the Deepthought clusters, you should consider using the --share flag. The default --exclusive flag will result in your account being charged for all cores on the node whether you use them or not. Not sharing also lowers our Fairshare score, leading to delay in scheduling jobs.

#SBATCH --share

Most of the nodes currently have at least 30GB of scratch space, some have as much as 250GB available, and a few have as little as 1GB available. Scratch space is currently mounted as /tmp. Scratch space will be cleared once your job completes. The following example specifies a scratch space requirement of 5GB. Note however that if you do this, the scheduler will set a filesize limit of 5GB. If you then try to create a file larger than that, your job will automatically be killed, so be sure to specify a size large enough for your needs. Note that the disk space size must be given in MB.

#SBATCH --tmp=5120

Checking Jobs

To view a list of all jobs you have running, you can use the squeue command, for example:

      login-1:~ squeue -u moulik
       JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
     2736589 ved-lab-hi test1   moulik  R   20:14:29      1 compute-b20-4
     2736588 ved-lab-hi test2   moulik  R   20:15:45      1 compute-b20-2
 

If the jobs are taking a while to get scheduled, the Reason column in the squeue output can give you a clue:

  • If there is no reason, the scheduler hasn't attended to your submission yet.
  • Resources means your job is waiting for an appropriate compute node to open.
  • Priority indicates your priority is lower relative to others being scheduled.
  • There are other Reason codes; see the SLURM squeue documentation for full details.

Your priority is partially based on your FairShare score and determines how quickly your job is scheduled relative to others on the cluster. To see your FairShare score, enter the command sshare -u <username>. Your effective score is the value in the last column, and, as a rule of thumb, can be assessed as lower priority ≤ 0.5 ≤ higher priority.

Deleting Jobs

The scancel command is used to delete jobs. Examples:

scancel  232323				(delete job 232323)
scancel -u username			(delete all jobs belonging to user)
scancel --name=JobName			(delete job with the name JobName)
scancel --state=PENDING                 (delete all PENDING jobs)
scancel --state=RUNNING                 (delete all RUNNING jobs)
scancel --nodelist=cn0005               (delete any jobs running on node cn0005)
 

Email Notification

To receive email notification of your job finishing (or crashing) you can set the --mail-type= and --mail-user= parameters at the top of your job's batch script, for example:

#!/bin/bash
#SBATCH -n 20
#SBATCH -t 1:00:00
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=moulik@umd.edu

Sample SLURM Scripts

The Deepthought HPC clusters use a batch scheduling system called Slurm to handle the queuing, scheduling, and execution of jobs. This scheduler is used in many recent HPC clusters throughout the world. Below are a number of sample scripts that can be used as a template for building your own SLURM submission scripts for use on Deepthought2. If you choose to copy one of these sample scripts, please make sure you understand what each line of the sbatch directives before using it to submit your jobs. Otherwise, you may not get the result you want and may waste valuable computing resources.

Basic, single-processor job

This script can serve as the template for many single-processor applications. The mem-per-cpu flag can be used to request the appropriate amount of memory for your job. Please make sure to test your application and set this value to a reasonable number based on actual memory use. The %j in the -o (can also use --output) line tells SLURM to substitute the job ID in the name of the output file. You can also add a -e or --error with an error file name to separate output and error logs.

Download the [{{#filelink: single_job.sh}} single_processor_job.sh] script {{#fileanchor: single_job.sh}}

#!/bin/sh
#SBATCH --job-name=serial_job_test    # Job name
#SBATCH --mail-type=ALL               # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<email_address>   # Where to send mail	
#SBATCH --ntasks=1                    # Run on a single CPU
#SBATCH --mem=600mb                   # Memory limit
#SBATCH --time=00:05:00               # Time limit hrs:min:sec
#SBATCH --output=serial_test_%j.out   # Standard output and error log

pwd; hostname; date

module load python

echo "Running plot script on a single CPU core"

# Run your program with correct path and command line options
./YOURPROGRAM INPUT
#python /homes/moulik/plot_template.py

date

Threaded or multi-processor job

This script can serve as a template for applications that are capable of using multiple processors on a single server or physical computer. These applications are commonly referred to as threaded, OpenMP, PTHREADS, or shared memory applications. While they can use multiple processors, they cannot make use of multiple servers and all the processors must be on the same node.

These applications required shared memory and can only run on one node; as such it is important to remember the following:

  • You must set --nodes=1, and then set --cpus-per-task to the number of OpenMP threads you wish to use.
  • You must make the application aware of how many processors to use. How that is done depends on the application:
    • For some applications, set OMP_NUM_THREADS to a value less than or equal to the number of cpus-per-task you set.
    • For some applications, use a command line option when calling that application.

Download the [{{#filelink: parallel_job.sh}} multi_processor_job.sh] script {{#fileanchor: parallel_job.sh}}

#!/bin/sh
#SBATCH --job-name=parallel_job_test # Job name
#SBATCH --mail-type=ALL              # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<email_address>  # Where to send mail	
#SBATCH --nodes=1                    # Use one node
#SBATCH --ntasks=1                   # Run a single task	
#SBATCH --cpus-per-task=4            # Number of CPU cores per task
#SBATCH --mem=600mb                  # Total memory limit
#SBATCH --time=00:05:00              # Time limit hrs:min:sec
#SBATCH --output=parallel_%j.out     # Standard output and error log

pwd; hostname; date

echo "Running prime number generator program on $SLURM_CPUS_ON_NODE CPU cores"

module load gcc/5.2.0 

# Run your program with correct path and command line options
./YOURPROGRAM INPUT

date


Another example, setting OMP_NUM_THREADS:

Download the [{{#filelink: parallel_job2.sh}} multi_processor_job2.sh] script {{#fileanchor: parallel_job2.sh}}

#!/bin/sh
#SBATCH --job-name=parallel_job_test # Job name
#SBATCH --mail-type=ALL              # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<email_address>  # Where to send mail	
#SBATCH --nodes=1                    # Use one node
#SBATCH --ntasks=1                   # Run a single task	
#SBATCH --cpus-per-task=4            # Number of CPU cores per task
#SBATCH --mem=600mb                  # Total memory limit
#SBATCH --time=00:05:00              # Time limit hrs:min:sec
#SBATCH --output=parallel_%j.out     # Standard output and error log

export OMP_NUM_THREADS=4

# Load required modules; for example, if your program was
# compiled with Intel compiler, use the following 
module load intel

# Run your program with correct path and command line options
./YOURPROGRAM INPUT

MPI job

This script can serve as a template for MPI, or message passing interface, applications. These are applications that can use multiple processors that may, or may not, be on multiple servers.

Our testing has found that it is best to be very specific about how you want your MPI ranks laid out across nodes and even sockets (multi-core CPUs). SLURM and OpenMPI have some conflicting behavior if you leave too much to chance. Please refer to the full SLURM sbatch documentation, but the following directives are the main directives to pay attention to:

  • -c, --cpus-per-task=<ncpus>
    • Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task.
  • -m, --distribution=arbitrary|<block|cyclic|plane=<options>[:block|cyclic|fcyclic]>
    • Specify alternate distribution methods for remote processes.
    • We recommend -m cyclic:cyclic, which tells SLURM to distribute tasks cyclically over nodes and sockets.
  • -N, --nodes=<minnodes[-maxnodes]>
    • Request that a minimum of minnodes nodes be allocated to this job.
  • -n, --ntasks=<number>
    • Number of tasks (MPI ranks)
  • --ntasks-per-node=<ntasks>
    • Request that ntasks be invoked on each node
  • --ntasks-per-socket=<ntasks>
    • Request the maximum ntasks be invoked on each socket

The following example requests 24 tasks, each with one core. It further specifies that these should be split evenly into 2 nodes, and within the nodes, the 12 tasks should be evenly split on the two sockets. So each CPU on the two nodes will have 6 tasks, each with its own dedicated core. The distribution option will ensure that MPI ranks are distributed cyclically on nodes and sockets.

SLURM is very flexible and allows users to be very specific about their resource requests. Thinking about your application and doing some testing will be important to determine the best request for your specific use.

Download the [{{#filelink: mpi_job.sh}} mpi_job.sh] script {{#fileanchor: mpi_job.sh}}

#!/bin/sh
#SBATCH --job-name=mpi_job_test      # Job name
#SBATCH --mail-type=ALL              # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<email_address>  # Where to send mail	
#SBATCH --ntasks=24                  # Number of MPI ranks
#SBATCH --cpus-per-task=1            # Number of cores per MPI rank 
#SBATCH --nodes=2                    # Number of nodes
#SBATCH --ntasks-per-node=12         # How many tasks on each node
#SBATCH --ntasks-per-socket=6        # How many tasks on each CPU or socket
#SBATCH --distribution=cyclic:cyclic # Distribute tasks cyclically on nodes and sockets
#SBATCH --mem-per-cpu=600mb          # Memory per processor
#SBATCH --time=00:05:00              # Time limit hrs:min:sec
#SBATCH --output=mpi_test_%j.out     # Standard output and error log
pwd; hostname; date

echo "Running prime number generator program on $SLURM_JOB_NUM_NODES nodes with $SLURM_NTASKS tasks, each with $SLURM_CPUS_PER_TASK cores."

module load intel/2016.0.109 openmpi/1.10.2

srun --mpi=pmi2 /ufrc/data/training/SLURM/prime/prime_mpi

date

Hybrid MPI/Threaded job

This script can serve as a template for hybrid MPI/Threaded applications. These are MPI applications where each MPI rank is threaded and can use multiple processors.

Our testing has found that it is best to be very specific about how you want your MPI ranks laid out across nodes and even sockets (multi-core CPUs). SLURM and OpenMPI have some conflicting behavior if you leave too much to chance. Please refer to the full SLURM sbatch documentation, as well as the information in the MPI example above.

The following example requests 8 tasks, each with 4 cores. It further specifies that these should be split evenly into 2 nodes, and within the nodes, the 4 tasks should be evenly split on the two sockets. So each CPU on the two nodes will have 2 tasks, each with 4 cores. The distribution option will ensure that MPI ranks are distributed cyclically on nodes and sockets.

Download the [{{#filelink: hybrid_pthreads_job.sh}} hybrid_pthreads_job.sh] script {{#fileanchor: hybrid_pthreads_job.sh}}

#!/bin/sh
#SBATCH --job-name=hybrid_job_test      # Job name
#SBATCH --mail-type=ALL                 # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<email_address>     # Where to send mail	
#SBATCH --ntasks=8                      # Number of MPI ranks
#SBATCH --cpus-per-task=4               # Number of cores per MPI rank 
#SBATCH --nodes=2                       # Number of nodes
#SBATCH --ntasks-per-node=4             # How many tasks on each node
#SBATCH --ntasks-per-socket=2           # How many tasks on each CPU or socket
#SBATCH --mem-per-cpu=100mb             # Memory per core
#SBATCH --time=00:05:00                 # Time limit hrs:min:sec
#SBATCH --output=hybrid_test_%j.out     # Standard output and error log

pwd; hostname; date
 
module load intel/2016.0.109 openmpi/1.10.2 raxml/8.2.8
 
srun --mpi=pmi2 raxmlHPC-HYBRID-SSE3 -T $SLURM_CPUS_PER_TASK \
      -f a -m GTRGAMMA -s /ufrc/data/training/SLURM/dna.phy -p $RANDOM \
      -x $RANDOM -N 500 -n dna
 
date

The following example requests 8 tasks, each with 8 cores. It further specifies that these should be split evenly on 4 nodes, and within the nodes, the 2 tasks should be split, one on each of the two sockets. So each CPU on the two nodes will have 1 task, each with 8 cores. The distribution option will ensure that MPI ranks are distributed cyclically on nodes and sockets.

Also note setting OMP_NUM_THREADS so that OpenMP knows how many threads to use per task.

Download the [{{#filelink: hybrid_OpenMP_job.sh}} hybrid_OpenMP_job.sh] script {{#fileanchor: hybrid_OpenMP_job.sh}}

#!/bin/bash

#SBATCH --job-name=LAMMPS
#SBATCH --output=LAMMPS_%j.out
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email_address>
#SBATCH --nodes=4              # Number of nodes
#SBATCH --ntasks=8             # Number of MPI ranks
#SBATCH --ntasks-per-node=2    # Number of MPI ranks per node
#SBATCH --ntasks-per-socket=1  # Number of tasks per processor socket on the node
#SBATCH --cpus-per-task=8      # Number of OpenMP threads for each MPI process/rank
#SBATCH --mem-per-cpu=2000mb   # Per processor memory request
#SBATCH --time=4-00:00:00      # Walltime in hh:mm:ss or d-hh:mm:ss

date
hostname

module load intel/2016.0.109 openmpi/1.10.2

export OMP_NUM_THREADS=8

srun --mpi=pmi2 /path/to/app/lmp_gator2 < in.Cu.v.24nm.eq_xrd
  • Note that MPI gets -np from SLURM automatically.
  • Note there are many directives available to control processor layout.
    • Some to pay particular attention to are:
      • --nodes if you care exactly how many nodes are used
      • --ntasks-per-node to limit number of tasks on a node
      • --distribution one of several directives (see also --contiguous, --cores-per-socket, --mem_bind, --ntasks-per-socket, --sockets-per-node) to control how tasks, cores and memory are distributed among nodes, sockets and cores. While SLURM will generally make appropriate decisions for setting up jobs, careful use of these directives can significantly enhance job performance and users are encouraged to profile application performance under different conditions.

Array job

Note that we use the simplest 'single-threaded' process example from above and extending it to an array of jobs. Modify the following script using the parallel, mpi, or hybrid job layout as needed.

Download the [{{#filelink: array_job.sh}} array_job.sh] script {{#fileanchor: array_job.sh}}

#!/bin/sh
#SBATCH --job-name=array_job_test   # Job name
#SBATCH --mail-type=ALL             # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<email_address> # Where to send mail	
#SBATCH --nodes=1                   # Use one node
#SBATCH --ntasks=1                  # Run a single task
#SBATCH --mem-per-cpu=1gb           # Memory per processor
#SBATCH --time=00:05:00             # Time limit hrs:min:sec
#SBATCH --output=array_%A-$a.out    # Standard output and error log
#SBATCH --array=1-5                 # Array range
pwd; hostname; date

echo This is task $SLURM_ARRAY_TASK_ID

date

Note the use of %A for the master job ID of the array, and the %a for the task ID in the output filename.

GPU job


#!/bin/bash
#SBATCH --job-name=gpuMemTest
#SBATCH --output=gpuMemTest.out
#SBATCH --error=gpuMemTest.err
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --mail-type=ALL
#SBATCH --mail-user=moulik@umd.edu
#SBATCH --account=ved-lab
#SBATCH --gres=gpu:12

module load cuda/8.0

cudaMemTest=/homs/moulik/CODES//cuda_memtest

cudaDevs=$(echo $CUDA_VISIBLE_DEVICES | sed -e 's/,/ /g')

for cudaDev in $cudaDevs
do
  echo cudaDev = $cudaDev
  $cudaMemTest --num_passes 1 --device $cudaDev > gpuMemTest.out.$cudaDev 2>&1 &
done
wait

Software

By default, very little software is automatically available, you have to specify the software you want by using a series of module commands before compiling the code and within you sbatch scripts. For a detailed explanation and a list of modules that *could* be available on Deepthought2, see the DIT Deepthought Software Guide. NOTE: This list is the list of modules available before Deepthought 2; in order to get a clean installation on the supercomputer and remove old unused code the IT team decided to not move modules over to Deepthought2 compute nodes until requested by users. Because of this, a module might load while using the login nodes and not on the compute nodes. The login nodes use a different filesystem and contain all of the previously available modules, whereas the compute nodes do not. Clicking on any the possible available modules on the DIT Deepthought Software Guide will tell you if it is available on Deepthought2. Any modules not available can be requested. If you have IT create new modules that may be of use to others in Geology, please update the following list here.

When specifying modules to load, they should always be specified in the following order

  1. intel (or nothing if using gfortran)
  2. openmpi
  3. netcdf and/or hdf4/5
  4. netcdf-fortran
  5. other stuff

Confirmed working software

The following list of software and versions is confirmed to be on the Deepthought 2 compute nodes:

  • hdf4/4.2.10
  • hdf5/1.8.13
  • intel/2013.1.039
  • netcdf/4.3.2
  • netcdf-fortran/4.4.1
  • nco/4.4.6
  • openmpi/gnu/1.6.5 and openmpi/intel/1.8.1
  • python/2.7.8

Other packages built for us can be seen Here.

Sample gfortran environment

The following is an example of a configuration known to be working with the gfortran compiler

#!/bin/bash
module load openmpi/gnu/1.6.5
module load netcdf/4.3.2
module load netcdf-fortran

export NETCDF=$NETCDF_FORTRAN_ROOT
export LD_LIBRARY_PATH=$NETCDF_LIBDIR:$NETCDF_FORTRAN_LIBDIR:$LD_LIBRARY_PATH
export FC=mpif90
export F77=mpif90
export LDFLAGS="$(nc-config --flibs --libs)"
export CPFLAGS="$(nc-config --fflags)"</nowiki>

Sample ifort environment

The following is an example of a configuration known to be working with the intel compiler (pay special attention to the ulimit command at the bottom, this is required to get any large program working with intel on these computers). For those using a csh shell instead, you'll have to use the command limit stacksize unlimited

#!/bin/bash
module load intel
module load openmpi/intel/1.8.1
module load netcdf/4.3.2
module load netcdf-fortran

export NETCDF=$NETCDF_FORTRAN_ROOT
export LD_LIBRARY_PATH=$NETCDF_LIBDIR:$NETCDF_FORTRAN_LIBDIR:$LD_LIBRARY_PATH
export FC=mpif90
export F77=mpif90
export LDFLAGS="$(nc-config --flibs --libs)"
export CPFLAGS="$(nc-config --fflags)"

ulimit -s unlimited

Sample Python environment

Most modules were built using python/2.7.8 compiled with gcc/4.6.1 compiler. The GNU compiler (gcc/4.6.1) is what one gets by default if no other compiler (gcc, intel, pgi, sunsuite) is loaded.

#!/bin/tcsh
module load openmpi/1.6.5
module load python/3.5.1
module load cuda/7.5.18

python ./YOURCODE.py