Graphics card flaw enables data theft in AMD, Apple, and Qualcomm chips by exploiting GPU memory

Example of the data an attacker is able to access, left behind in a GPU's VRAM
(Image credit: Trail of Bits)

A new security vulnerability called LeftoverLocals affects GPUs made by some of the leading names, like AMD, Apple, and Qualcomm. It enables data theft from the GPU's memory irrespective of the form factor and operating system. The flaw was discovered by the researchers at 'Trail of Bits.' Since these GPUs are used in a wide range of smartphones, tablets, notebooks, PCs, and purpose-built servers, the vulnerability leaves a wide range of computing devices at risk. 

PCs and servers are designed to allow multiple users to share system processing resources without being able to access each other's data. However, the LeftoverLocals vulnerability negates that protection and infiltrates other users' data via the GPU's memory. Once the attacker has access to the device with a vulnerable GPU, the attacker can access its memory and read its data, as it contains residual data even after a particular execution is complete. 

The group posted its findings and a proof of concept using an open-source LLM program, Llama.cpp, to access data from another system, showing data within seconds after it was prepared and stored in the graphics processor's memory. Once the attacker has access to the system, the exploit uses less than ten lines of code.

The researchers tested 11 GPUs made by different vendors for different platforms. Impacted GPUs include the AMD Radeon RX 7900 XT and Apple's GPUs in the iPhone 12 Pro and M2 Macbook Air. The group confirmed that the latest iPhone 15 variants do not appear to be affected. 

Based on extensive research, the group found GPUs made by AMD, Apple, and Qualcomm are vulnerable to this attack. Researchers could not find flaws in Intel, Nvidia, Arm, or Imagination GPUs. The research group disclosed this security risk to the US-CERT Coordination Center and the Khronos Group.

Acknowledgement by GPU Vendors

AMD, Apple, and Qualcomm have now acknowledged the issue. Apple made a patch available for the affected Apple A17 and M3 series processors on January 10. However, Apple hasn't clarified the situation with other impacted devices yet, like the Apple MacBook Air 3rd Generation with its A12 processor. Qualcomm also rolled out a new firmware (v2.07) to patch some of its devices.

AMD posted a security bulletin marking the severity of the issue as 'medium.' The chipmaker listed all the affected CPUs with on-chip graphics, discrete graphics cards, and data center GPUs. AMD says it plans to create a new mode that prevents processes from running in parallel on the GPU's memory and clears the VRAM between processes. This mitigation process won't arrive until March 2024.

Freelance News Writer
  • rluker5
    So AMDs mitigation will be disabled by default, but there does exist an option to turn it on.

    Better than nothing.
    Reply
  • HaninTH
    Well... At least they still have to get control of your system before this exploit is possible.

    Is the exploit, in action, detectable?
    Reply
  • Alvar "Miles" Udell
    AMD says it plans to create a new mode that prevents processes from running in parallel on the GPU's memory and clears the VRAM between processes. This mitigation process won't arrive until March 2024.

    Sounds like AMD GPUs are about to lose performance...
    Reply
  • TJ Hooker
    Alvar Miles Udell said:
    Sounds like AMD GPUs are about to lose performance...
    AMD's statement indicates the fix they are planning will not be enabled by default. So no impact unless you go in and enable it (which a home user likely won't have reason to do).
    Reply
  • Metal Messiah.
    The article actually fails to touch on the most important part. I think this particular attack is more significant for LLMs and ML models, as it underlines the "overlooked" security risks in ML development stacks.

    Although, basically LeftoverLocals can be used to attack any app that uses the GPU's local memory, such as image processing or drawing, but data leakage from large-scale language models (LLMs) is of particular and pressing concern here by the researchers.

    As you can read in the blog, the researchers particularly highlighted the effects on the use of large language models and machine learning applications. The vulnerability is basically allowing hackers to access an AI model’s output by 'eavesdropping' on the kernels it uses to process user queries.

    Trail of Bits showed that the output of LLM can be reconstructed with high accuracy through a PoC, as they were able to steal 181MB of data from an LLM run on an AMD GPU Radeon RX 7900 XT, enough to fully reproduce the response of a 7B (7 billion parameters) model.

    So basically this Data leakage permits eavesdropping on LLM sessions more like, and affects ML models and applications on GPU platforms.

    Especially considering that most deep neural network (DNN) computations heavily rely on local memory, the implications could be vast, at least for now, which might impact ML implementations across embedded and data-center domains.

    But the good thing is that for this vulnerability to be exploited, it requires the attacker to have access to the target device with the vulnerable GPU, so for an average user/consumer, this attack vector isn't something to worry about IMO, as any attacker would need to already have established some amount of operating system access on the target’s device first.

    Escalated privileges are not required though.

    However, Apple hasn't clarified the situation with other impacted devices yet, like the Apple MacBook Air 3rd Generation with its A12 processor.

    You meant to say A12-based iPad Air ?
    Reply
  • bit_user
    Metal Messiah. said:
    But the good thing is that for this vulnerability to be exploited, it requires the attacker to have access to the target device with the vulnerable GPU, so for an average user/consumer, this attack vector isn't something to worry about IMO, as any attacker would need to already have established some amount of operating system access on the target’s device first.
    If all you need to do is look at the contents of the GPU's local memory, you should be able to do that using APIs like WebGL and WebGPU, I think. Although that could probably be mitigated by the browser, it would otherwise enable code on a website to scrape data from your GPU.

    I might be wrong, but I think it's probably worth looking into.
    Reply
  • TJ Hooker
    bit_user said:
    If all you need to do is look at the contents of the GPU's local memory, you should be able to do that using APIs like WebGL and WebGPU, I think. Although that could probably be mitigated by the browser, it would otherwise enable code on a website to scrape data from your GPU.

    I might be wrong, but I think it's probably worth looking into.
    They do discuss that in the original trail of bits article:
    "We note that it appears that browser GPU frameworks (e.g., WebGPU) are not currently impacted, as they insert dynamic memory checks into GPU kernels."
    Reply
  • Metal Messiah.
    Yeah, carrying out an attack from a browser via WebGPU is difficult because this API adds dynamic array bounds checks to GPU processes that operate when accessing local memory.


    Graphics card flaw enables data theft in AMD, Apple, and Qualcomm chips by exploiting GPU memory

    I wouldn't call it a flaw actually, per se, but more like the said vulnerability is due to insufficient isolation of local GPU memory and failure to clean up local memory after processes on the GPU are executed.

    Thereby allowing an attacker's process to identify data remaining in local memory after another process executed or read data from a process currently running.

    Unlike CPUs which typically isolate memory in a way bypassing exploits like this; GPUs sometimes do not.

    I mean, as we all know that the local memory in a GPU is a separate, and a faster memory area tied to a compute unit and also acting as an analogue of a processor's cache. Thus local memory is being used instead of global memory to store intermediate computations.

    So it all boils down to launching a handler (kernel) on the GPU that periodically copies the contents of the available local graphics card memory into VRAM (global).

    And of course, since local memory is not cleared when switching between processors running on the GPU and is shared between different processes within the same GPU compute unit, residual data from other processes can be found in it. Hence the possibility of an attack vector.

    it’s possible to use a GPU's local memory to connect two GPU kernels together, even if the two kernels aren’t on the same application or used by the same person. The attacker can then use GPU compute apps such as OpenCL, Vulkan or Metal to write a GPU kernel that dumps uninitialized local memory into the target device.
    Reply
  • bit_user
    Metal Messiah. said:
    Yeah, carrying out an attack from a browser via WebGPU is difficult because this API adds dynamic array bounds checks to GPU processes that operate when accessing local memory.
    It's not only bounds-checking, though. It would also need to ensure the local memory structures are initialized - either with 0's or some other data, before they can be read.

    Metal Messiah. said:
    I wouldn't call it a flaw actually, per se, but more like the said vulnerability is due to insufficient isolation of local GPU memory and failure to clean up local memory after processes on the GPU are executed.
    Eh, it's kind of a flaw not to wipe local memory when swapping in wavefronts or warps from another process to use a compute unit or SM. It seems like the sort of thing that might be fixable via firmware, however.

    Metal Messiah. said:
    Unlike CPUs which typically isolate memory in a way bypassing exploits like this; GPUs sometimes do not.
    They didn't used to, but now have MMUs so GPU threads (normally) cannot spy on each other or arbitrary addresses in system memory.
    Reply
  • TechyIT223
    I don't see this issue as much of a concern for us gamers though. unless you are working on LLM sessions on a network.
    Reply