Unity GPU Documentation¶
Graphics Processing Units (GPUs) provide a powerful tool to run code in parallel at a larger scale than traditional CPU parallel workload. This comes at the tradeoff with slower communication times. It is important to note that using one or more GPUs does not guarantee that code will run faster, however many popular software packages have been modified to incorporate GPUs for better performance.
Available GPU Resources¶
Device | Arch | Caps | VRAM | Constraint(s) |
---|---|---|---|---|
NVIDIA GeForce GTX TITAN X | Maxwell | sm_52 | vram8 vram11 vram12 | titanx |
Tesla M40 24GB | Maxwell | sm_52 | vram8 vram11 vram12 vram16 vram23 | m40 |
NVIDIA GeForce GTX 1080 Ti | Pascal | sm_52 sm_61 | vram8 vram11 | 1080ti |
Tesla V100-PCIE-16GB | Volta | sm_52 sm_61 sm_70 | vram8 vram11 vram12 vram16 | v100 |
Tesla V100-SXM2-16GB | Volta | sm_52 sm_61 sm_70 | vram8 vram11 vram12 vram16 | v100 |
Tesla V100-SXM2-32GB | Volta | sm_52 sm_61 sm_70 | vram8 vram11 vram12 vram16 vram23 vram32 | v100 |
NVIDIA GeForce RTX 2080 | Turing | sm_52 sm_61 sm_70 sm_75 | vram8 | 2080 |
NVIDIA GeForce RTX 2080 Ti | Turing | sm_52 sm_61 sm_70 sm_75 | vram8 vram11 | 2080ti |
Quadro RTX 8000 | Turing | sm_52 sm_61 sm_70 sm_75 | vram8 vram11 vram12 vram16 vram23 vram32 vram40 vram48 | rtx8000 |
NVIDIA A100-PCIE-40GB | Ampere | sm_52 sm_61 sm_70 sm_75 sm_80 | vram8 vram11 vram12 vram16 vram23 vram32 vram40 | a100, a100-40g |
NVIDIA A100-SXM4-80GB | Ampere | sm_52 sm_61 sm_70 sm_75 sm_80 | vram8 vram11 vram12 vram16 vram23 vram32 vram40 vram48 vram80 | a100, a100-80g |
Requesting GPU Resources¶
Requesting GPU access on Unity can be done via Slurm either for an interactive session or using a batch script. Below are a minimal example of both interactive and batch jobs.
Note
Not all software is able to use GPUs, and some software will require special options, dependencies, or alternate versions to be able to run with GPUs. Please ensure your software supports GPU use before requesting these resources.
Interactive
srun -p gpu-preempt -t 02:00:00 --gpus=1 --pty /bin/bash
Batch Script
#!/bin/bash
#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00 # Set max job time for 2 hours
#SBATCH --gpus=1 # Request access to 1 GPU
$SBATCH --constraint=2080ti # Request access to a 2080ti GPU
./myscript.sh
Specific GPUs can also be selected by using the --constraint
flags with Slurm,
or by adding the gpu type to --gpus
. The available constraints are listed
below.
Note
Using --constraint
allows you to select multiple possible GPUs that fulfil the requirements. You can either use --constraint=[2080|2080ti]
or --constraint=sm_70&vram12
. It is better to use the first form if you are using GPUs across more than one node to ensure the same model is used across all entire job.
- 2080ti
- 1080ti
- 2080
- titanx
- m40
- rtx8000
- v100
- a100
Batch Script with Specific GPU
#!/bin/bash
#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00 # Set max job time for 2 hours
#SBATCH --gpus=2080ti:1 # Request access to 1 2080tiGPU
./myscript.sh
Batch Script with Constraint
#!/bin/bash
#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00 # Set max job time for 2 hours
#SBATCH --gpus=1 # Request access to 1 2080tiGPU
#SBATCH --constraint=2080ti
./myscript.sh
Batch Script with Constraint specifying multiple options
#!/bin/bash
#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00 # Set max job time for 2 hours
#SBATCH --gpus=1 # Request access to 1 2080tiGPU
#SBATCH --constraint=2080ti|1080ti|2080
./myscript.sh
GPU-Enabled Software¶
CUDA: NVIDIA's parallel computing platform. A version of this will typically be required to be loaded for most GPU jobs, as this allows access to this NVIDIA compiler suite (nvcc, nvfortran) as well as the NVIDIA GPU profiling tool (nsys).
cuDNN: Cuda Deep Neural Network library, often used to accelerate deep learning frameworks in Keras, PyTorch, TensorFlow, and others.
OpenMPI: The OpenMPI compilers for MPI compiled against the cuda compilers. This is necessary to use if software that uses both MPI and GPU acceleration.
Note: be sure to check which version(s) of cuda are compatible with the software that is being used.
Software Name | Available Verions |
---|---|
cuda 11 | 11.8.0, 11.5.0, 11.4.0, 11.3.1, 11.0.3, 11.0.1 |
cuda 10 | 10.2.89, 10.1.243, 10.0.130 |
cuda legacy versions (<10.0) | 9.2, 9.2.88, 9.0, 8.0.61, 8.0, 7.5.18, 7.0, 6.5.14, 6.0 |
cudnn | cuda11-8.4.1.50, cuda10-7.5.0.56, 8.2.4.15-11.4 |
openmpi | 4.1.3+cuda11.6.2-mpirun, 4.1.3+cuda11.6.2 |
In addition to these, many programming languages are able to use one or more GPUs.
- Python
- Matlab
- Julia
- C++ (using Cuda or OpenACC)
- Fortran (using Cuda or OpenACC)
- C (using Cuda or OpenACC)
Setting up a TensorFlow GPU Environment¶
Some software, especially with python, requires setting up the environment in a specific way.
For python programs that can use GPU, such as TensorFlow, this is best done using a conda environment.
The steps to set up a conda environment for TensorFlow is shown below:
- request an interactive session with a GPU node
srun -t 01:00:00 -p gpu-preempt --gpus=1 --mem=16G --pty /bin/bash
- load modules
module load miniconda/22.11.1-1
module load cuda/11.4.0
module load cudnn/cuda11-8.4.1.50
- create the environment
conda create --name TensorFlow-env python=3.9
Note: TensorFlow 2 requires a python version of at least 3.9
conda activate TensorFlow-env
pip install TensorFlow
pip install tensorrt
conda install ipykernel
Note: if you do not request enough memory, TensorRT will fail to install
- Add environment to Jupyter
python -m ipykernel install --user --name TensorFlow-env --display-name="TensorFlow-Env"
After completing these steps, a new kernel with the name "TensorFlow-Env" will be shown with new Open OnDemand sessions
Troubleshooting with GPUs¶
To view ongoing GPU processes, the nvidia-smi pmon
command can be used.
If you are getting error messages, please be sure to add the following command to your scripts in order to know which GPU is being used.
nvidia-smi -L
If there is a CUDA_ERROR_OUT_OF_MEMORY, a GPU with more available VRAM may be necessary, or the code being run should be modified to reduce the memory usage.