Nvidia RTX 4060 AI Performance
GPUs are also used with professional applications, AI training and inferencing, and more. Along with our usual professional tests, we've added Stable Diffusion benchmarks on the various GPUs. AI is a fast-moving sector, and it seems like 95% or more of the publicly available projects are designed for Nvidia GPUs. Those Tensor cores aren't just for DLSS, in other words. Let's start with our AI testing and then hit the professional apps.
We're using Automatic1111's Stable Diffusion version for the Nvidia cards, while for AMD we're using Nod.ai's Shark variant — we used the automatic build version 20230521.737 for these results, which is now a month out of date, launched with "--iree_vulkan_target_triple=rdna3-7900-windows" as recommended by AMD, or "rdna2-unknown-windows" for the RX 6000-series (that's the default).
For Intel GPUs, we used a tweaked version of Stable Diffusion OpenVINO. That hasn't been updated in a few months, and we couldn't get 768x768 image generation working, so we'll revisit the Arc results at some point in the future. (When we do, we'll update our main Stable Diffusion benchmarks page... which is also currently outdated.)
Tensor and matrix cores in modern GPUs were created specifically for this type of workload, so it's no surprise that Nvidia and Intel GPUs do quite a bit better than their AMD counterparts. The RTX 4060 can't quite match the RX 7900 XT and XTX at 512x512 image generation (those aren't shown here), but it's quite a bit faster than any other current AMD GPUs, and the (untuned on AMD) 768x768 results favor Nvidia even more.
There are some other interesting things to note. Raw memory bandwidth appears to be a bigger factor here as well, and the 4060 only leads the previous generation 3060 by 6% at 512x512, and 9% for 768x768 images. Along with having less memory, the RTX 4060 clearly isn't going to be an AI powerhouse. It's okay for basic stuff, but that's about it.
Intel's Arc GPUs seem to do decently as well, and perhaps a different library could further narrow the gap. The A770 16GB and A750 both deliver higher output rates than the RX 7600 and RX 6700 XT, for example.
There are other AI workloads, particularly those that use LLMs (Large Language Models) where VRAM capacity can be more important that computational performance. Running a local chatbot as an example required 10GB or even 24GB of VRAM for some of the models, and there are even larger GPT-3 based models for Nvidia's A100/H100 data center GPUs and DGX servers.
Nvidia RTX 4060 Professional Workloads
Most true professional graphics cards cost a lot more than the RTX 4060. They come with drivers that are better tuned for professional applications, at least in some cases, as well as improved support for those applications. Still, you can get by with a consumer GPU like the RTX 4060 in a pinch. Also note that some of our professional tests only run on Nvidia GPUs, so those charts won't have the AMD or Intel cards.
SPECviewperf 2020 consists of eight different benchmarks, and we use the geometric mean from those to generate an aggregate "overall" score. Note that this is not an official score, but it gives equal weight to the individual tests and provides a nice high-level overview of performance. Few professionals use all of these programs, however, so it's typically more important to look at the results for the application(s) you plan to use.
Nvidia's RTX 4060 as expected takes up its now traditional spot behind the RTX 3060 Ti but ahead of the RTX 3060. It manages to (barely) beat the 3060 Ti in one test, energy-03, but overall it's 21% faster than the 3060 and 8% slower than the 3060 Ti.
AMD's RX 7600 — and AMD GPUs in general — do much better in SPECviewperf than the Nvidia consumer GPUs, mostly thanks to the snx-04 test where they're about an order of magnitude faster than their Nvidia counterparts. Intel's Arc GPUs incidentally do okay in a couple of the tests, but generally rank near the bottom of the charts.
For 3D rendering, Blender is a popular open-source rendering application, and we're using the latest Blender Benchmark, which uses Blender 3.50 and three tests. Blender 3.50 includes the Cycles X engine that leverages ray tracing hardware on AMD, Nvidia, and even Intel Arc GPUs. It does so via AMD's HIP interface (Heterogeneous-computing Interface for Portability), Nvidia's OptiX API, and Intel's OneAPI — which means Nvidia GPUs have some performance advantages due to the OptiX API.
This time, the RTX 4060 actually ranks ahead of the RTX 3060 Ti in overall performance. It falls behind in the Junkshop scene, but performs quite a bit better in the Monster and Classroom scenes. AMD GPUs aren't as fast in Blender, or really in any heavy ray tracing apps, and even the RTX 3050 manages to deliver better overall performance.
Our final two professional applications only have ray tracing hardware support for Nvidia's GPUs. OctaneBench puts the RTX 4060 just being the 3060 Ti again, while V-ray has the 4060 nearly tied in CUDA mode, and slightly ahead when using OptiX rendering. The RTX 3060 trails by a decent margin in either case.