If anyone cares more about millisecond-long delays than gamers, it's developers. They know a millisecond can make a big difference in how a game plays. That's bad news for Google Stadia (opens in new tab) because devs recently claimed an issue with the Linux kernel scheduler can lead to issues in games ported to the platform.
A developer named Malte Skarupke publicized the problem on Monday. Skarupke explained how he became aware of the issue and his efforts to address it in a blog post (opens in new tab)(shout-out to Phoronix (opens in new tab) for spotting the post).
This is the high-level overview Skarupke provided before offering more technical details about the issue:
"I overheard somebody at work complaining about mysterious stalls while porting Rage 2 to Stadia. The only thing those mysterious stalls had in common was that they were all using spinlocks. I was curious about that because I happened to be the person who wrote the spinlock we were using. The problem was that there was a thread that spent several milliseconds trying to acquire a spinlock at a time when no other thread was holding the spinlock. Let me repeat that: The spinlock was free to take, yet a thread took multiple milliseconds to acquire it. In a video game, where you have to get a picture on the screen every 16ms or 33ms (depending on if you’re running at 60Hz or 30Hz), a stall that takes more than a millisecond is terrible. Especially if you’re literally stalling all threads."
Skarupke said he spent months investigating the issue before concluding that "most mutex implementations are really good, that most spinlock implementations are pretty bad and that the Linux scheduler is OK but far from ideal." He eventually decided to apply the band-aid solution of switching from a spinlock to a mutex.
More information is available in Skarupke's blog post, which is worth a read for anyone curious about how much difference a few milliseconds of latency can make while playing a game--especially on a streaming platform like Stadia--and how developers try to solve those problems. Hopefully it stops becoming an issue in the future.
The original developer's rant simply amounts to "using spinlocks on Windows works as I want it, but not on Linux". From what can be found, it would seem that this is a bug, not a feature, of the Windows kernel.
Seems consistent with what the original author concluded, which is sound advice. I never use spinlocks.
Um, not really. He was debugging a real problem, which was stuttering that he didn't see on Windows. Even if his measurement methodology was flawed, I think he still reached the best conclusion, and had some interesting observations about different Linux schedulers that he tried (not the timing data, which Torvalds debunked, but the more casual observations) and about realtime thread scheduling (in short: don't).
Edit: I do think it's pretty sad to see how much time that developer spent, trying to optimize a fundamentally bad approach. However, that ultimately serves to reinforce his conclusion & Torvalds' message. Had his analysis not been so involved, I doubt this would've gotten quite so much attention.