Google Explains Meltdown, Spectre Fix Impact On Cloud Services

Google says that it wasn’t Meltdown that had the greatest impact on its cloud services but Spectre Variant 2. To fix it, the company created Retpoline, a software-only solution that regular users unfortunately can’t benefit from.

Unlike many had predicted, Meltdown--the Intel-only vulnerability that is fixed by forcing the CPU to reload its TLB when running a kernel process--wasn’t the biggest headache for Google. In a blog post that goes into detail on the impact of Meltdown/Spectre on Google apps’ backend, Google said that, because of the amount of time they had known about Meltdown, “extensive performance tuning work” made it so that by the time they deployed the patch for it in October, the “protections caused no perceptible impact in [its] cloud.”

The real headache for Google turned out to be Spectre Variant 2. The hardware fix was to outright disable some forms of speculative execution in the CPU, rather than just nullify them in the situations that matter, which is what the fix for Meltdown does. The performance impact of this was significant. Google explained:

Not only did we see considerable slowdowns for many applications, we also noticed inconsistent performance, since the speed of one application could be impacted by the behavior of other applications running on the same core. Rolling out these mitigations would have negatively impacted many customers.

Without anything to lose, Google looked into “moonshot” solutions and ended up devising Retpoline, a software-only solution that avoided any hardware change and caused “almost no performance loss.” Being the obvious solution, Retpoline was deployed across Google’s infrastructure and shared with others.

Spectre Variant 2 is the one for which we need BIOS fixes. If Retpoline fixes it without requiring any hardware change, then why do we need BIOS fixes? Retpoline is a software fix, but it’s a compile-time fix. That means that its a change implemented in the software compiler that will modify the final executable that comes out. It doesn’t mean that software has to be rewritten, but it does mean that it has to be recompiled.

For closed systems running proprietary software, which is what the Google apps’ backend is, Retpoline is the ultimate solution. It doesn’t require rewriting high-level code in individual programs. Just recompile them and it’s done. However, that’s not going to work for regular users’ systems because it’s impossible to ensure that every program that everyone runs has been re-compiled with Retpoline. As a result, to secure against Spectre Variant 2, user systems have to be patched on the hardware level. Now we don’t know specifically what the BIOS fixes that Intel, and now AMD, are pushing out do, but, from what Google says, we assume that they disable some forms of speculative execution.

The embargo period for Meltdown/Spectre and Retpoline is why we didn’t see the massive impact to cloud service provider, who many predicted would be hit the hardest by the fixes. Many datacenter workloads are proprietary software, so their creators have full control over what is being run. Since we regular users don’t have this visibility into our software, we might end up being the hardest hit by the Meltdown/Spectre fixes.

  • 2Be_or_Not2Be
    "Since we regular users don’t have this visibility into our software, we might end up being the hardest hit by the Meltdown/Spectre fixes."

    There have been a number of tests that show desktop performance will not be affected significantly. In doing some tests, the biggest difference in one test for a heavy Photoshop load only amounted to a tenth of a second difference. So almost all desktop users won't see much of a difference.

    However, the biggest hit is in all those disk benchmarks. In their synthetic tests, they are seeing 30-40% drops. Now that is going to be interesting to see how they figure out how to get accurate results. I feel a little sorry for the ones doing those benchmarks; perhaps they will need to keep an unpatched system (non-Internet connected, of course) to keep running them for a while.
    Reply
  • alextheblue
    20588946 said:
    However, the biggest hit is in all those disk benchmarks. In their synthetic tests, they are seeing 30-40% drops. Now that is going to be interesting to see how they figure out how to get accurate results. I feel a little sorry for the ones doing those benchmarks; perhaps they will need to keep an unpatched system (non-Internet connected, of course) to keep running them for a while.
    Those synthetic tests aren't worth a fart, especially for gauging the performance hit for consumer workloads post-MD/Spectre. TH already tested storage and the hit was negligible. This is why they ran that article in the first place... in response to all the web outcry quoting synthetic storage bench results.

    http://www.tomshardware.com/news/microsoft-meltdown-patch-storage-performance,36236.html

    Obviously commercial workloads are going to be a different animal.
    Reply
  • alextheblue
    Also, I know this should be obvious... but since Retpoline requires a recompile... I can't help but think gee that's not exactly a silver bullet. How is that superior to a microcode update??
    Reply
  • bit_user
    20589846 said:
    Also, I know this should be obvious... but since Retpoline requires a recompile... I can't help but think gee that's not exactly a silver bullet. How is that superior to a microcode update??
    Branch prediction, itself, is so fundamental that it's probably not implemented in microcode. The performance/efficency benefit of hard-wiring it is probably too great.

    Maybe some CPUs have the ability to configure its behavior from BIOS, however. Or perhaps there are hacks you could do in the microcode for jump instructions, but maybe the microcode doesn't have enough flexibility or visibility to implement the same technique used by the compiler.
    Reply