How Nvidia's NVLink Boosts GPU Performance


NVLink is a new feature for Nvidia GPUs that aims to drastically improve performance by increasing the total bandwidth between the GPU and other parts of the system.

In modern PCs, GPUs and numerous other devices are connected by PCI-E lanes to the CPU's or the motherboard's chipset. For some GPUs, using the available PCI-E lanes provides sufficient bandwidth that a bottleneck does not occur, but for high-end GPUs and multi-GPU setups, the number of PCI-E lanes and total bandwidth available is insufficient to meet the needs of the GPU(s) and can cause a bottleneck.

Nvidia NVLink

In an attempt to improve this situation, some motherboard manufacturers will sometimes opt to use PLX chips, which can help better utilize the bandwidth from the PCI-E lanes coming from the CPU, but overall bandwidth does not really increase. Nvidia's solution to this problem is called NVLink.

According to Nvidia, NVLink is the world's first high-speed interconnect technology for GPUs, and it allows data to be transferred between the GPU and CPU five to 12 times faster than PCI-E. Nvidia also claimed that application performance using NVLink can be up to twice as fast, relative to PCI-E.


Programs that utilize the Fast Fourier Transform (FFT) algorithm, which is heavily used in seismic processing, signal processing, image processing and partial differential equations, see the greatest performance increase. These types of applications are heavily used inside of servers and are typically bottlenecked by the PCI-E bus.

Other applications used in various fields of research see performance increases, too. According to Nvidia, one application used to study the behavior of matter by simulating molecular structures, called AMBER, gains up to a 50 percent performance increase using NVLink.


When two GPUs are utilized inside of the same system, they can be joined by four NVLink links, which can provide 20 GB/s transfer per link, totaling 80 GB/s transfer between the two cards. Because the cards no longer need to communicate using some of the scarce PCI-E bandwidth, this frees up additional bandwidth for the CPU to send data to the GPUs.

Nvidia claimed that IBM is currently integrating it into future POWER CPUs, and the U.S. Department of Energy announced that it will utilize NVLink in its next flagship supercomputer.

Follow Michael Justin Allen Sexton @LordLao74. Follow us @tomshardware, on Facebook and on Google+.

Create a new thread in the US News comments forum about this subject
This thread is closed for comments
22 comments
Comment from the forums
    Your comment
    Top Comments
  • oczdude8
    Quote:
    This doesn't make any sense. PCIE has been giving us plenty of bandwidth, it's always stayed ahead of actual needed performance. Its also a tried and true fact that 16 lanes of the thing are unnecessary for today's graphics cards. The performance difference between using 16 and 4 lanes is only marginal. The difference between x8 and x16 is pretty much zero. In fact some tested higher in x8 but within margin of error. This is why hooking up your graphics card over thunderbolt is a liable idea.


    You are assuming the only use GPUs have are for gaming. That's not true. The article CLEARLY states some of the applications that do benefit from this technology. GPUs are very good at computations, much much faster then CPUs.
    14
  • digitalvampire
    This is really impressive and will be great for HPC application developers ... as long as it's not locked down for CUDA only. There have been enough devs (especially in academia) already moving away from CUDA to OpenCL due to vendor lock-in. I'd hate to see a great hardware innovation largely ignored because of this, especially with NVidia's great GPUs.
    11
  • Other Comments
  • digitalvampire
    This is really impressive and will be great for HPC application developers ... as long as it's not locked down for CUDA only. There have been enough devs (especially in academia) already moving away from CUDA to OpenCL due to vendor lock-in. I'd hate to see a great hardware innovation largely ignored because of this, especially with NVidia's great GPUs.
    11
  • gamebrigada
    This doesn't make any sense. PCIE has been giving us plenty of bandwidth, it's always stayed ahead of actual needed performance. Its also a tried and true fact that 16 lanes of the thing are unnecessary for today's graphics cards. The performance difference between using 16 and 4 lanes is only marginal. The difference between x8 and x16 is pretty much zero. In fact some tested higher in x8 but within margin of error. This is why hooking up your graphics card over thunderbolt is a liable idea.
    -10
  • oczdude8
    Quote:
    This doesn't make any sense. PCIE has been giving us plenty of bandwidth, it's always stayed ahead of actual needed performance. Its also a tried and true fact that 16 lanes of the thing are unnecessary for today's graphics cards. The performance difference between using 16 and 4 lanes is only marginal. The difference between x8 and x16 is pretty much zero. In fact some tested higher in x8 but within margin of error. This is why hooking up your graphics card over thunderbolt is a liable idea.


    You are assuming the only use GPUs have are for gaming. That's not true. The article CLEARLY states some of the applications that do benefit from this technology. GPUs are very good at computations, much much faster then CPUs.
    14