Sign in with
Sign up | Sign in
WinFS: Microsoft's Data Management Vision
By ,
1. Introduction

Over the past year, Microsoft has managed to create a perfect smokescreen around its new WinFS file system. It has spent this time touting a new, database-supported filing system to replace NTFS and FAT. Compatibility doubts were not long in bubbling to the surface. During the PDC (Professional Developers Conference) held in Los Angeles at the end of October, we spoke with Microsoft brass to gain an exclusive insight into the planned technological advance.

NTFS Holds On Into Longhorn

NTFS forms the basis for WinFS. Communication between WinFS and NTFS is controlled by the WinFS core, which also provides administrative and security-related functions as well as interacting with services, schemes and data models.

Based on this, Microsoft will not be building an entirely new file system after all. Windows Future Storage is integrated into Longhorn as a modular extension to file management. This highly complex indexing technology is based on a relational database. The main requirement is still the use of NTFS, however. It is less prone to error than FAT32 and can cope with data volumes in the teraBytes. The task of physically managing and incorporating files remains in the hands of NTFS. In layman's terms, WinFS mirrors folders and files, enabling it to speed up searches and retrieval of these data based on suitable XML parameters and, above all, according to variable criteria. That said, anyone who installs Longhorn on a FAT32 partition will not experience the joy of futuristic file administration, but nor will they suffer any noticeable loss of functionality.

That is a reasonable compromise between the technological possibilities and the requirements of millions of developers and users throughout the world who feared incompatibility between the old and new systems. Longhorn will thus be able to run older applications, too. Having several operating systems on one hard drive continues to be an option. Common data access by Linux and Longhorn on a common partition is also possible - admittedly without WinFS. To enable Linux to load Windows data in the first place, the Microsoft system should be booted from a FAT partition.

Technologically, Longhorn is in any case not entrenched in WinFS. The OS works just as well on FAT32 volumes. Conversely, WinFS can be used in other systems, too, for example, in Microsoft's upcoming server generations. Implementation in Windows XP is also feasible, given that virtually all the necessary system interfaces (APIs) exist in XP. Microsoft may be crafting a wholly new WIN32 environment called WinFx, but also promises seamless downward compatibility.

2. Cross-Format File Administration


File-dependent context menu: in the future, the system will serve up functions that match the data type.

The prime tasks of the NTFS add-on are administration, organization and file retrieval. WinFS is also slated to take care of synchronizing and protecting data stocks. However, Microsoft, it has to be said, is staying tight-lipped about all this.

Up to now, all Windows versions, including Windows XP, have saved files based on storage location and such descriptive parameters as date and format. That suffices to relocate files in the system, so long as they're Microsoft's own or belong to properly integrated applications in the system.

Here's an instructive example that exposes current weaknesses and future benefits: Windows XP shows inexplicable weaknesses when it comes to unknown files - and that even applies to files from its own applications. Fill a new text file in Notepad with the term "Windows" and you can be sure that the find function will locate the file in a jiffy via the full-text search. But try changing the extension to a new, made-up format and you won't get anywhere. Windows will just overlook this file with the same criteria in the search, even though it's physically the same file in the same location with the self-same content.

3. XML Metadata Take On The Clipboard

This shortcoming isn't only obvious when hunting for data from non-Windows applications, such as CRM solutions. Windows XP searches are invariably file-based, as structured organization of data(sets) is currently not possible. As a result, the user - regardless of the number of proprietary applications - has access to a raft of different, sometimes competing search technologies. That, at least, is the status quo.

WinFS seeks to unify the organization and research of required data. The first solution to this is a multidimensional data index based on flexible criteria. XML arguments, which Windows adds to each file, are used. These metadata go way beyond what is currently understood by indexing. Which information is supplied on a file besides storage location, size, user and creation date is determined by the system and the user. Criteria that mark out data according to content are the real boon here. Windows will also reportedly be able to recognize semantic relationships and thus will no longer treat files as isolated elements. For example, the next-generation of Windows will not only list the hits for a full-text search, but also will group them according to relevance and provide document previews. And, ideally, the user will be presented with corresponding contacts, tables, network and Internet links all in one go.

The vision is an operating system in the shape of a relational database that considers all stored information as records. This brave new digital world doesn't come without a few limitations, however. For incorporation into the progressive indexing procedure, Windows can only use metadata on those file formats that the system recognizes and can handle. To return to the example above: non-Windows file formats will not automatically be referenced via metadata. The old failings of the clipboard and the search have metamorphosed into new ones - so long as developers and administrators don't turn to their own solutions for succor.

4. WinFS Inside

The brain of WinFS is what's called the data model. The term conceals a mechanism that uniformly administers and structures digital elements. Microsoft talks about "items" in this context. The word is a good choice, since items add a range of further descriptive arguments to each file. These parameters do not occur in a file's header, though, but are administered solely by WinFS. No changes then as far as the physical data structure goes, which is still looked after by NTFS. Under this scheme, not only files, but contacts, favorites and mails as well are registered as items.

From the user's standpoint, items degrade the files' physical storage location to the point of insignificance. Instead, Windows organizes the data according to content in virtual folders. In searching for these data, user-based criteria such as "All vacation pics of the last two years" replace details such as file format, author and storage location.

Microsoft has gone with a variable item model for WinFS. Developers can define further items using XML metadata and determine relationships between items. This enables, say, all documents by a certain author to be automatically displayed together with address data and contextually related images.

The view options in the file explorer and the commands linked to certain file types are variable in design, too. Developers can stipulate which tasks in the context menu are available for which items and which thumbnails are selected, for example. Hence, the file explorer in Longhorn can take on a whole new range of tasks. It will mean, for instance, that developers will additionally be able to automatically display or execute commands linked to items located by a specific search. If the user makes a search of mails, for example, the Explorer can be used to get Outlook to prepare and send a standard response mail - all at the click of a mouse.

It seems equally probable that a link will run from Microsoft's envisaged Rights Management to the Next Generation Secure Computer Base (NGSCB), a title that is already contained in rudimentary form in the alpha version of Longhorn. Quite possibly, the system will be able at some point to sort files that don't meet certain security criteria.

5. Data Tags: XML Schemas

XML schemas define how files of a certain type are treated - in this case address data.

The pool of item definitions is stored in the corresponding scheme. In simple terms, schemes are the hierarchically superior instance of items. Exactly what Windows interprets as a document, contacts, video or audio data and what contextual links exist between item types is governed by XML schemas. Developers are able to define new schemes to record certain information structured as items - for example, particular mail attachments.

The advantage of this approach is that WinFS uses metadata to link items with related content, such as contact data with documents. File formats are not about to lose their significance altogether, however, but will serve alongside other useful functions to allocate new file types (extensions) to the existing stock of items.

6. WinFS Services

The SQL database stores the metadata relevant to indexing.

Besides for item management, WinFS takes on other services. One of them is called Info Agent and automates file tasks. For example, Info Agents can independently recognize which incoming mail attachments are potentially at risk for viruses and can ask how to proceed, or they can automatically task a virus scanner to make an inspection. In the same way, Info Agents can automatically remove temporary files less than 1 kB in size.

Microsoft sees the Info Agents as a variable instrument capable of unifying recurring tasks. The range of Info Agent duties is designed to be variable and adjustable using guidelines. An analogy to batch programming is tempting, with the difference that Windows not only lines up command chains but also allows causal links. The user defines which system tasks are automatically executed whenever certain events occur.

Synchronization is another service for comparing a selected data stock (contacts, documents, etc.) across several partitions, storage disks or external computers (peer-to-peer network). The clever thing about synchronization is that, aided by the corresponding schemes, it is also billed as functioning between formerly incompatible applications such as Outlook and CRM solutions. Before this can happen, however, developers have to first adapt the service flexibly to their own needs.

Virtual Folders

The possibilities offered by cross-referencing will make themselves felt in future Windows versions in a number of areas. It is feasible that even home users will be encouraged by the variable structure to adapt file management functions such as searches or the file explorer to their own specifications, for example by standardizing search paths. But that all depends on the development tools that Microsoft makes available. Only the route was sketched at the PDC in October. Complete implementation of the system, on the other hand, is likely to take one or several years to come to fruition.

Web Links