Linus Torvalds: Linux Scheduler Not To Blame For Google Stadia Port Issues

Google Stadia
(Image credit: Google)

A few days ago, we learned that the Linux kernel might be responsible for performance issues in titles being ported to the Google Stadia game streaming platform, as developer Malte Skarupke detailed in a blog post. Phoronix reported on Sunday that Linux creator Linus Torvalds investigated the matter, however, and said with his characteristic bluntness that Skarupke's report was "pure garbage."

The short of it had to do with the way the scheduler deals with spinlocks, and that using them caused long hangups in scheduling, consequently causing delays in getting frames rendered. The longest stalls lasted up to over 100ms on spinlocks, which is far too long a hangup when frames need to be displayed at either 30 or 60 fps. 

Skarupke noted that "most mutex implementations are really good, that most spinlock implementations are pretty bad and that the Linux scheduler is OK but far from ideal", and decided to use mutex locks instead of spinlocks to solve the issue. 

Torvalds' comments appeared in an email thread and read as follows.

"The whole post seems to be just wrong, and is measuring something completely different than what the author thinks and claims it is measuring. First off, spinlocks can only be used if you actually know you're not being scheduled while using them...It basically reads the time before releasing the lock, and then it reads it after acquiring the lock again, and claims that the time difference is the time when no lock was held. Which is just inane and pointless and completely wrong. That's pure garbage.

Torvalds' fix suggests developers "Use a lock where you tell the system that you're waiting for the lock, and where the unlocking thread will let you know when it's done, so that the scheduler can actually work with you, instead of (randomly) working against you...I repeat: do not use spinlocks in user space, unless you actually know what you're doing. And be aware that the likelihood that you know what you are doing is basically nil."

This essentially means that Skarupke's finding that spinlocks aren't ideal for use in this scenario is, for all intents and purposes, correct. The catch is that developers shouldn't have been using spinlocks in the first place--so as far as Linus Torvalds is concerned, it wasn't Linux scheduler that was to blame, but rather developers' approaches to using it.

Of course, we'll have to see if this new information actually helps developers sort out the issues they've been having with their Stadia ports. 

Niels Broekhuijsen

Niels Broekhuijsen is a Contributing Writer for Tom's Hardware US. He reviews cases, water cooling and pc builds.

  • bit_user
    Nice Torvalds quotes.

    I read his posts and agree with this summary. It's kinda sad to see someone trying so hard to optimize a fundamentally wrong-headed approach, as that original developer had done.

    Anyone who has a clue what a mutex or a spinlock is would probably benefit from reading Torvalds' posts:
    https://www.realworldtech.com/forum/?threadid=189711&curpostid=189723https://www.realworldtech.com/forum/?threadid=189711&curpostid=189755https://www.realworldtech.com/forum/?threadid=189711&curpostid=189759
    Reply
  • alextheblue
    Why didn't these issues crop up on all the other platforms?
    Reply
  • bit_user
    alextheblue said:
    Why didn't these issues crop up on all the other platforms?
    Different schedulers are optimized for different use cases. Linus makes this point, several different times. His posts are really worth reading, if you can follow the arguments.

    He supposes that the developers originally did some profiling and tuned their code for their main platform (i.e. either Windows or games consoles). If you think about it, games consoles' schedulers were probably modeled on the Windows schedulers of the day, in order to smooth over these kinds of issues.

    And the only other analogous cases would be Mac & mobile ports. But, Mac gaming is pretty niche, and there are probably even bigger issues to deal with, on mobile ports.
    Reply
  • alextheblue
    bit_user said:
    He supposes that the developers originally did some profiling and tuned their code for their main platform (i.e. either Windows or games consoles). If you think about it, games consoles' schedulers were probably modeled on the Windows schedulers of the day, in order to smooth over these kinds of issues.
    I'm pretty sure PS4's stock OS is FreeBSD based, seems like the Rage 2 codebase was running on fairly diverse hardware and software combinations without issue. I'm not saying these issues aren't caused/exacerbated by the game code... but rather I agree with your "Different schedulers are optimized for different use cases" statement. MS has tuned and retuned their scheduler multiple times in recent history, for example. I don't think the Linux scheduler is perfect in all cases, and the attitude of "you're coding it wrong" makes me chuckle a bit when other schedulers aren't having any such issues with the same implementation. Maybe the truth lies somewhere in the middle... the code is flawed (well, what complex codebase isn't actually), but maybe Linux could handle it better.
    Reply
  • bit_user
    alextheblue said:
    I'm pretty sure PS4's stock OS is FreeBSD based,
    That says nothing about its scheduler, though, which they certainly at least tweaked and possibly completely replaced.

    I'd certainly expect that a scheduler designed for general-purpose computing wouldn't be ideal for latency-sensitive or soft-realtime workloads. There's a basic tradeoff, that Linus repeatedly touches upon, between latency and throughput. For desktop computing, you'd probably try to balance the two. For server or HPC workloads tasks, you'd typically want throughput-oriented behaviors. The other extreme is a hard-realtime systems, where you make significant tradeoffs in throughput and efficiency, in order to have low & deterministic response times.

    alextheblue said:
    seems like the Rage 2 codebase was running on fairly diverse hardware and software combinations without issue.
    It might not utilize any userspace spin-locks. While I get the impression they're not uncommon, among games, they have various downsides and aren't exactly a "best practice". They're best-characterized as a fragile optimization that sometimes isn't.

    alextheblue said:
    MS has tuned and retuned their scheduler multiple times in recent history, for example. I don't think the Linux scheduler is perfect in all cases,
    I don't know the history of Linux' default thread scheduler, but Linus implied that it's been the subject of much tweaking and experimentation, over the years, I'd be surprised if that weren't the case, considering how much both workloads and computer architectures have evolved.

    alextheblue said:
    the attitude of "you're coding it wrong" makes me chuckle a bit when other schedulers aren't having any such issues with the same implementation.
    You can chuckle all you want, but if you really dig into the argument, I think that's the only logical conclusion. I won't try to summarize his argument - you can read it if you care. But, I'd caution you against taking a position without understanding the case for/against.

    In the previous post, I was suggesting that games tuned their spinlocks against Windows' scheduler's behavior, and that consoles probably didn't deviate much from that, for the sake of improving portability. I think that adequately explains why "other schedulers aren't having any such issues with the same implementation", though we don't really know whether or how much locking typically changes with ports to/from consoles (though I've heard that the original game developers aren't the ones who typically do ports - so, maybe changing out the locks is a standard thing these porting houses are just used to doing). So, it's also risky to generalize from very little information about the subject.

    alextheblue said:
    Maybe the truth lies somewhere in the middle...
    That's a common response to complex issues, but I doubt it holds true any more often than in simpler disputes.

    alextheblue said:
    the code is flawed (well, what complex codebase isn't actually), but maybe Linux could handle it better.
    No, I disagree with that. The userspace code is simply operating on assumptions about kernel scheduling behavior that it has no business making, and not actually communicating with the kernel to let it know how the code wants to be scheduled. When you're waiting on something, you should tell the kernel what you're waiting for. The fundamental problem with userspace spinlocks is that they don't do this.

    But, here I go, doing what I said I wouldn't. Just read Linus' posts. They're longish, but to-the-point and pretty straight-forward.
    Reply