[Tutorial CUDA] Nvidia GPU: CUDA Compute Capability

When you are compiling CUDA code for Nvidia GPUs it’s important to know which is the Compute Capability of the GPU that you are going to use. How many times you got the error

nvcc fatal : Unsupported gpu architecture 'compute_XX'

and you do not know how to correctly solve it.

The solution is relatively simple, you must add the correct FLAG to “nvcc” call:

-gencode arch=compute_XX,code=[sm_XX,compute_XX]

where “XX” is the Compute Capability of the Nvidia GPU board that you are going to use.

Now you need to know the correct value to replace “XX“, Nvidia helps us with the useful “CUDA GPUs” webpage.

For example, if your GPU is a Nvidia Titan Xp, you know that it is a “GeForce product“, you search for it in the right table and you find that its Compute Capability is 6.1, so the correct FLAG to use in the compiler is

-gencode arch=compute_61,code=[sm_61,compute_61]

if you are instead an embedded developer and you are a lucky owner of a Nvidia Jetson TX2, you must search in “TEGRA/Jetson products” and you will find that the Compute Capability of the TX2 is 6.2, so you need to use this configuration:

-gencode arch=compute_62,code=[sm_62,compute_62]

straightforward!

If you are compiling a software that will run on different machines and you do not know exactly which GPU is installed on them, you can generate an application that supports more than one Nvidia GPU technology:

Following a list of the compute capabilities for the most common GPUs:

Tegra Tegra Mobile & Jetson Products

GPU Compute Capability
Jetson TX2 6.2
Jetson TX1 5.3
Jetson TK1 3.2
Tegra X1 5.3
Tegra K1 3.2

Tesla Tesla Data Center Products

GPU Compute Capability
Tesla V100 7.0
Tesla P100 6.0
Tesla P40 6.1
Tesla P4 6.1
Tesla M60 5.2
Tesla M40 5.2
Tesla K80 3.7
Tesla K40 3.5
Tesla K20 3.5
Tesla K10 3.0

Quadro Quadro Mobile Products

GPU Compute Capability
Quadro P5000 6.1
Quadro P4000 6.1
Quadro P3000 6.1
Quadro M5500M 5.2
Quadro M2200 5.2
Quadro M1200 5.0
Quadro M620 5.2
Quadro M520 5.0
Quadro K6000M 3.0
Quadro K5200M 3.0
Quadro K5100M 3.0
Quadro M5000M 5.0
Quadro K500M 3.0
Quadro K4200M 3.0
Quadro K4100M 3.0
Quadro M4000M 5.0
Quadro K3100M 3.0
Quadro M3000M 5.0
Quadro K2200M 3.0
Quadro K2100M 3.0
Quadro M2000M 5.0
Quadro K1100M 3.0
Quadro M1000M 5.0
Quadro K620M 5.0
Quadro K610M 3.5
Quadro M600M 5.0
Quadro K510M 3.5
Quadro M500M 5.0

GeForce GeForce Notebook Products

GPU Compute Capability
GeForce GTX 1080 6.1
GeForce GTX 1070 6.1
GeForce GTX 1060 6.1
GeForce GTX 980 5.2
GeForce GTX 980M 5.2
GeForce GTX 970M 5.2
GeForce GTX 965M 5.2
GeForce GTX 960M 5.0
GeForce GTX 950M 5.0
GeForce 940M 5.0
GeForce 930M 5.0
GeForce 920M 3.5
GeForce 910M 5.2
GeForce GTX 880M 3.0
GeForce GTX 870M 3.0
GeForce GTX 860M 3.0/5.0(**)
GeForce GTX 850M 5.0
GeForce 840M 5.0
GeForce 830M 5.0
GeForce 820M 2.1
GeForce 800M 2.1
GeForce GTX 780M 3.0
GeForce GTX 770M 3.0
GeForce GTX 765M 3.0
GeForce GTX 760M 3.0
GeForce GTX 680MX 3.0
GeForce GTX 680M 3.0
GeForce GTX 675MX 3.0
GeForce GTX 675M 2.1
GeForce GTX 670MX 3.0
GeForce GTX 670M 2.1
GeForce GTX 660M 3.0
GeForce GT 750M 3.0
GeForce GT 650M 3.0
GeForce GT 745M 3.0
GeForce GT 645M 3.0
GeForce GT 740M 3.0
GeForce GT 730M 3.0
GeForce GT 640M 3.0
GeForce GT 640M LE 3.0
GeForce GT 735M 3.0
GeForce GT 635M 2.1
GeForce GT 730M 3.0
GeForce GT 630M 2.1
GeForce GT 625M 2.1
GeForce GT 720M 2.1
GeForce GT 620M 2.1
GeForce 710M 2.1
GeForce 705M 2.1
GeForce 610M 2.1
GeForce GTX 580M 2.1
GeForce GTX 570M 2.1
GeForce GTX 560M 2.1
GeForce GT 555M 2.1
GeForce GT 550M 2.1
GeForce GT 540M 2.1
GeForce GT 525M 2.1
GeForce GT 520MX 2.1
GeForce GT 520M 2.1
GeForce GTX 485M 2.1
GeForce GTX 470M 2.1
GeForce GTX 460M 2.1
GeForce GT 445M 2.1
GeForce GT 435M 2.1
GeForce GT 420M 2.1
GeForce GT 415M 2.1
GeForce GTX 480M 2.0
GeForce 710M 2.1
GeForce 410M 2.1

Notes

(*) OEM-only products

(**) The GeForce GTX860 and GTX870 come in two versions depending on the SKU, please check with your OEM to determine which one is in your system.

  1. 1152 Kepler Cores with Compute Capability 3.0
  2. 640 Maxwell Cores with higher clocks and Compute Capability 5.0 or 5.2

Comments are closed