New virtual memory tools, devices may speed up Windows Vista I/O

Redmond (WA) - When Microsoft group vice president Jim Allchin - who has since been promoted to co-president - demonstrated a handful of Vista-related projects emerging from the company's Core OS Group laboratory, perhaps too few details were provided for anyone, including Tom's Hardware Guide , to get a clear and complete picture of what these technologies are, and what they'll do.

So when the Core OS' Group's software architect, Rob Reinauer, offered to provide us with more in-depth information, we weren't going to be too proud to pass it up. One of Reinauer's key projects is a technology called Superfetch that appears to speed up system boot times by as much as a factor of four. Superfetch is part of an effort by the Core OS Group to extend and improve Vista's virtual memory system, which provides data to Windows after collecting that data from disk and from the output of running applications.

"We in the Core OS Group have put a lot of time on Vista," said Reinauer, "into trying to optimize system responsiveness - trying to make the system feel responsive." To that end, Reinauer's team has been working on Superfetch and two other technologies: Extended Memory Devices (EMDs) and hybrid hard drives, which utilize flash memory in a caching system for most read and some write operations.

Superfetch is an extension of the demand paging system that's been in place in Windows since the mid-1990s. Originally developed by engineer David Cutler for Digital Equipment Corp.'s VMS, and acquired by Microsoft when it hired Cutler, demand paging is the technology which the operating system uses to acquire data. All addressable memory through Windows is stored in a virtual memory (VM) pool - this includes the contents of disk-based files, as well as data allocated by programs for their private use. Rather than being addressed as files, memory contents are addressed as equally-sized pages, containing streams of data. A Windows application may "think" it's streaming in the contents of a file - at least, that's how the source code appears. But in actuality, the demand paging system streams portions of a data file or even an executable code file in pages, as necessary, not all at once.

Normally, when a computer is shut off, the contents of RAM are completely shuttled out to the virtual memory file on disk. What Superfetch does is extend this model so that, once the computer is booted up again, it can retrieve the memory pages that applications are most likely to utilize. This way, most commonly used memory pages can be pre-loaded, even before a user's common applications are started. "Superfetch is a memory management strategy that works to maintain the correct content in memory," Reinauer explained. "It is a methodology for keeping track of several weeks' worth of history, of all of the page references within memory, and as a result, being able to prioritize which pages should stay in memory." For example, occasional operations such as defrags or virus scans may not be most commonly loaded. Superfetch has its own heuristics for recognizing when occasional processes such as these would tend to "perturb" memory, pushing out more valuable pages that a user's applications are more likely to call upon. The system judges "more likely" based on ascertained historical behavior patterns. "Superfetch now has a much broader set of history to understand which pages are important and should stay in memory," Reinauer added, "and which pages appear to be moving through temporarily. As a result, when push comes to shove, those are the ones that should get pushed out as opposed to the valuable pages."

Results of a head-to-head comparison of boot times, the rightmost column with Superfetch running.

During the first-day keynote session at PDC, Allchin demonstrated Superfetch in action with a computer that was set to automatically load a series of applications at start-up: Outlook, Visio, PowerPoint, Adobe Acrobat, Access, OneNote, and Publisher. Without Superfetch running in memory, the computer took 36.8 seconds to load and run these applications in sequence. With Superfetch, the applications were loaded and launched in 10.6 seconds.

Throughout all three days of PDC keynotes, groups of semi-related or even somewhat unrelated categories of announcements were often clustered or crammed into one segment or set of slides, with the language adjusted so that some technologies could be announced simultaneously. As a result, many attendees, and even a few reporters for certain technical publications, were given the impression that these clustered technologies were more closely related than they actually were. The most notable example was during Day 3, when the company's senior vice president for the Server division, Bob Muglia, introduced Windows Server 2003 Compute Cluster Edition, the next-generation command line Monad, and some parts of the WinFX programming library, in the same paragraph and even the same slide. During Day 1, Allchin demonstrated the Core OS group's EMD at work, but appeared to introduce it as part of Superfetch.

Microsoft's Jim Allchin prepares to demonstrate Superfetch to a keynote audience at PDC 2005.

"Superfetch works great if you have a reasonable amount of memory," Allchin told the audience, "and it works fantastic if you have boatloads of memory. But what if you don't have boatloads of memory?...You know, a lot of people have these USB memory sticks. I wonder if we could take advantage of those, to make them part of the virtual memory system?" Allchin then plugged a USB memory stick into a test computer, and after checking a running histogram of memory activity, reported, "We just got another 500 megs of memory on this machine...And as you can see, Superfetch is just taking advantage of all of it."

As Rob Reinauer explained to us, what Allchin's EMD was doing was serving as a caching mechanism, providing VM access to data being read from disk, and also providing VM access to data that has already been written to disk. This last point is important; the EMD does not serve as a conventional write cache, which is why the USB stick can be unplugged at random without data being lost.

"At the end of the day, a random disk I/O - especially a random disk read - is one of the prime causes of problems in system responsiveness," remarked Reinauer. "That's not surprising; it's between 7 - 12 ms. In today's modern measurements, that's a really large time. So what we are using external memory devices for, is a caching mechanism." This caching mechanism is made simpler for Windows to manage by enrolling it as part of the greater VM pool.

As Reinauer pointed out to us, Tom's Hardware Guide's own readers responded to our initial Superfetch introduction story last week, by referring to reports that USB2-based hard drives are generally slower than ATA-based drives for sequential I/O. "As a result, the External Memory subsystem allows reasonably large read requests to be serviced by the hard drive," said Reinauer. "Logic is also in place to recognize sequential access patterns, and allow those to also be serviced by the hard drive."

"We can service I/Os typically less than a millisecond out of a USB flash device," he added. Preventing data from a disconnected USB device from being pilfered is a one-time encryption system, which Reinauer points out is not PGP-based, and not reliant upon public keys which could otherwise be obtained from systems, and used to decrypt contents.

The third technology which Reinauer's Core OS Group is working on - which didn't get much air time during the PDC keynotes - is hybrid hard drives, which utilize flash memory and conventional mechanisms. We'll have more to say about these devices, and how Windows Vista may make rapid use of them, in a future report. Stay in touch with Tom's Hardware Guide.