Nvidia RTX 4060 Ti AI Performance
GPUs are also used with professional applications, AI training and inferencing, and more. Along with our usual professional tests, we've added Stable Diffusion benchmarks on the various GPUs. AI is a fast-moving sector, and it seems like 95% or more of the publicly available projects are designed for Nvidia GPUs. Those Tensor cores aren't just for DLSS, in other words. Let's start with our AI testing and then hit the professional apps.
We're using Automatic1111's Stable Diffusion version for the Nvidia cards, while for AMD we're using Nod.ai's Shark variant — we used the automatic build version 20230521.737 for testing, launched with "--iree_vulkan_target_triple=rdna3-7900-windows" as recommended by AMD, or "rdna2-unknown-windows" for the RX 6000-series (that's the default). The Nvidia GPUs were tested after replacing the default CUDA DLL files with newer versions, as recommended by Nvidia.
This particular sort of workload is ideally suited to the tensor cores in Nvidia's RTX GPUs. The RTX 3060 more than doubles the Stable Diffusion throughput of the RX 6800 for 512x512 images, and triples the 768x768 performance. We do need to mention that Nod.ai doesn't have "tuned" performance for 768x768, at least with the version of Shark Stable Diffusion that we used, and that's likely a factor. Still, we've been waiting to see improved 768x768 throughput for several months now.
The RTX 4060 Ti takes up its standard position in the charts otherwise, with performance just a bit below the RTX 3070 but also a bit ahead of the RTX 3060 Ti. Like many other workloads, Stable Diffusion is perfectly fine with using the larger L2 cache to overcome the reduction in memory bandwidth.
There are other AI workloads, particularly those that use LLMs (Large Language Models) where VRAM capacity can be more important that computational performance. For example, when we last poked around with running a local chatbot, some of the models required 10GB or even 24GB of VRAM just to run — and there are even larger models for Nvidia's A100/H100 data center GPUs and DGX servers.
Nvidia RTX 4060 Ti Professional Workloads
SPECviewperf 2020 consists of eight different benchmarks, and we use the geometric mean from those to generate an aggregate "overall" score. Note that this is not an official score, but it gives equal weight to the individual tests and provides a nice high-level overview of performance. Few professionals use all of these programs, however, so it's typically more important to look at the results for the application(s) you plan to use.
Nvidia's RTX 4060 Ti does battle with the RTX 3060 Ti here, winning in some tests and losing in others. Overall, it lands between the 3060 Ti and 3070 once again, but 3D Studio Max as an example either prefers the higher raw memory bandwidth of the 3060 Ti, or else it's simply better tuned for the Ampere architecture.
For its part, AMD released drivers that provided a substantial boost to SPECviewperf scores last year. AMD GPUs score particularly well in snx-04 (or if you prefer, Nvidia's consumer RTX cards do very poorly). AMD also tends to score higher in catia-06, creo-03, energy-03, and medical-03, while Nvidia GPUs do better in 3dsmax-07 — with maya-06 and solidworks-07 being more neutral.
To match the RX 6000-series GPUs in some workloads, you'd need one of Nvidia's professional cards. (Also: No, I'm not sure why the RX 6600 performed so well. I tested it alongside the RX 6600 XT and RX 6650 XT within the past 24 hours, and it simply performed better for some reason.) Anyway, if you use any of these applications on a regular basis, that could be enough to sway your GPU purchasing decision.
Moving on to 3D rendering, Blender is a popular open-source rendering application, and we're using the latest Blender Benchmark, which uses Blender 3.50 and three tests. Blender 3.50 includes the Cycles X engine that leverages ray tracing hardware on AMD, Nvidia, and even Intel Arc GPUs. It does so via AMD's HIP interface (Heterogeneous-computing Interface for Portability), Nvidia's CUDA or OptiX APIs, and Intel's OneAPI — which means Nvidia GPUs have some performance advantages due to the OptiX API.
The RTX 4060 Ti falls just behind the RTX 3060 in the Junkshop scene, but then it performs better than the RTX 3070 Ti in Monster and Classroom. AMD GPUs don't do nearly as well, and even the RTX 3060 comes out ahead of the RX 6750 XT and RX 6800. (And again, we have a case where one of AMD's GPUs seems to underperform, as normally the RX 6800 should be a bit faster.) Overall, Blender performance ends up being a strong point for the RTX 40-series GPUs.
Our final two professional applications only have ray tracing hardware support for Nvidia's GPUs. OctaneBench puts the RTX 4060 Ti roughly on par with the RTX 3070 (slightly ahead overall), while V-ray has the 4060 Ti trailing by just a hair. The RTX 4070 meanwhile delivers substantially higher performance than the 4060 Ti in these tests.
- MORE: Best Graphics Cards
- MORE: GPU Benchmarks and Hierarchy
- MORE: All Graphics Content