Google says that it wasn’t Meltdown that had the greatest impact on its cloud services but Spectre Variant 2. To fix it, the company created Retpoline, a software-only solution that regular users unfortunately can’t benefit from.
Unlike many had predicted, Meltdown--the Intel-only vulnerability that is fixed by forcing the CPU to reload its TLB when running a kernel process--wasn’t the biggest headache for Google. In a blog post that goes into detail on the impact of Meltdown/Spectre on Google apps’ backend, Google said that, because of the amount of time they had known about Meltdown, “extensive performance tuning work” made it so that by the time they deployed the patch for it in October, the “protections caused no perceptible impact in [its] cloud.”
The real headache for Google turned out to be Spectre Variant 2. The hardware fix was to outright disable some forms of speculative execution in the CPU, rather than just nullify them in the situations that matter, which is what the fix for Meltdown does. The performance impact of this was significant. Google explained:
Not only did we see considerable slowdowns for many applications, we also noticed inconsistent performance, since the speed of one application could be impacted by the behavior of other applications running on the same core. Rolling out these mitigations would have negatively impacted many customers.
Without anything to lose, Google looked into “moonshot” solutions and ended up devising Retpoline, a software-only solution that avoided any hardware change and caused “almost no performance loss.” Being the obvious solution, Retpoline was deployed across Google’s infrastructure and shared with others.
Spectre Variant 2 is the one for which we need BIOS fixes. If Retpoline fixes it without requiring any hardware change, then why do we need BIOS fixes? Retpoline is a software fix, but it’s a compile-time fix. That means that its a change implemented in the software compiler that will modify the final executable that comes out. It doesn’t mean that software has to be rewritten, but it does mean that it has to be recompiled.
For closed systems running proprietary software, which is what the Google apps’ backend is, Retpoline is the ultimate solution. It doesn’t require rewriting high-level code in individual programs. Just recompile them and it’s done. However, that’s not going to work for regular users’ systems because it’s impossible to ensure that every program that everyone runs has been re-compiled with Retpoline. As a result, to secure against Spectre Variant 2, user systems have to be patched on the hardware level. Now we don’t know specifically what the BIOS fixes that Intel, and now AMD, are pushing out do, but, from what Google says, we assume that they disable some forms of speculative execution.
The embargo period for Meltdown/Spectre and Retpoline is why we didn’t see the massive impact to cloud service provider, who many predicted would be hit the hardest by the fixes. Many datacenter workloads are proprietary software, so their creators have full control over what is being run. Since we regular users don’t have this visibility into our software, we might end up being the hardest hit by the Meltdown/Spectre fixes.