Exclusive: Microsoft Patents The Search Engine

While it is clear that this patent is the foundation for Bing, it is somewhat stunning to see how this patent collides in its details with what could also be claimed to be the foundation of Google's search engine.

The technology field is littered with silly patents that potentially should have never been awarded in the first place and do not serve their intended focus to foster innovation and protect the intellectual property of ingenious minds. One of those recent patents may have been Microsoft's Heureka! moment that the idea of the operating system shutdown should be generally owned by the company.
  
In fact, we are lately seeing much more activity in patents that cover a wide range of technologies, many of which are critical to the future of IT. For example, I have recently stressed that Google patented cloud browser sync as well as threaded email management, while Microsoft patented GPU-accelerated video encoding and the voice search capability through a search engine. In fact, it is especially the patent battle between Microsoft and Google that highlights much of the IP that is likely to determine key technology of the future Internet platform.

Microsoft has now bagged a patent that is labeled as "a search engine platform", which - by its general nature - is sure to raise some interest across the industry. Exactly what search engine has Microsoft patented here?

Patent filings always have a general and detailed description and as you dive deeper, this patent gets much more interesting. Microsoft general claim is:  
      
"Systems and methods to perform efficient searching for web content using a search engine are provided. In an illustrative implementation, a computing environment comprises a search engine computing application having an essential pages module operative to execute one or more selected selection algorithms to select content from a cooperating data store. In an illustrative operation, the exemplary search engine executes on a received search query to generate search results. Operatively, the retrieved results can be generated based upon their joint coverage of the submitted search query by deploying a selected sequential forward floating selection (SFFS) algorithm executing on the essential pages module. In the illustrative operation, the SFFS algorithm can operate to iteratively add one and delete one element from the set to improve a coverage score until no further improvement can be attained. The resultant processed search results can be considered essential pages."

As I read through the patent, I learned that Microsoft described is a search engine technology that aims to increase the likelihood to find certain content with fewer mouse clicks. This idea is based on traditional search engine spidering techniques, a ranking system, as well as secondary information from neighboring search results to retrieve relevant information for a re-ranking of a search result. In Microsoft's words:

"In addition to relevance, existing practices also consider diversity of Web-search results as an additional factor for ordering documents. A re-ranking technique based on maximum marginal relevance criterion to reduce redundancy from search results as well as presented document summarizations has been considered. Additionally, an affinity ranking scheme to re-rank search results by optimizing diversity and information richness of the topic and query results has been developed. Such practices model the variance of topics in groups of documents.

The herein described systems and methods provide a modeling of the overall knowledge space for a specific query and improving the coverage of this space by a set of documents. In an illustrative implementation a "bag-of-words" model for representing knowledge spaces is provided. Additionally, in the illustrative implementation, a formal notion of coverage over the "bag-of-words" is provided and a simple but systematic algorithm to select documents that maximize coverage is derived to allow relevance to the search topic."

Microsoft considers a web page as a "bag-of-words" where keywords are filtered, extracted and counted to achieve a certain valuation of that document. The result is basically a document that lists keywords. Microsoft's patented search engine platform relies on a bag-of-words approach in which "a document is processed as a collection of statistics over a set (i.e., bag, of words used in it, without explicit semantic constructions such as sentences, formatting, etc.)." This document based on a bag-of-words provides the foundation for what Microsoft calls "essential pages" that relate to the bag-of-words and are said to eliminate certain less relevant search results from a search query and require a user to perform fewer mouse clicks.

The indexing and processing of the bag-of-words is a highly complex process and involves an interpretation and processing of each word, including the identification of the root of the word, word stemming. For example, Microsoft removes the endings as well as those that do not describe context semantics, such as "as," "is," or "be." According to Microsoft, this process will provide more "pertinent search results."

So, how is this patent different from what Google does today? Microsoft applied for this patent in March 2008, about one year before the company provided a first glimpse at the search engine. The concept comes down to keyword generation, extraction and storage - as well as a way how they are applied to a search query. The description largely describes what Google has been doing for several years as well as a keyword practice that has been implemented by basic search engine optimization efforts for several years. And even if Microsoft's patent differs in certain details from Google's approach, it is somewhat surprising that this idea has made it through the U.S. Patent and Trademark Office in the record time of not even three years. Legally, Microsoft may have some leverage against Google, even if it is questionable whether Microsoft would really try to go after Google at this time - in a critical technology area such as keyword extraction and interpretation.      

Google's lawyers, on the other hand, may want to look at this patent more closely and figure whether Microsoft has invaded Google territory with this patent or not. Somehow I feel that this is not the last time we have heard of this patent.

  • Netherscourge
    Good luck having that patent upheld in court.
    Reply
  • zorky9
    Wow.. Now that's a patent ogre.
    Reply
  • Trizomik
    They really want us to hate them...
    Reply
  • bdaonion
    These things keep getting more and more petty...
    Reply
  • IzzyCraft
    NetherscourgeGood luck having that patent upheld in court.obviously you never seen some of the frankly ludicrous cases that has flown out of US patient court cases.
    TrizomikThey really want us to hate them...Who microsoft or google both gobble up tons of patents for things people would think should be allowed for everyone some of which was listed in the article if you read it, and google has moved to gobbling up upstarts that compete with them on some level.

    They are both large corps =p
    Reply
  • drakefyre
    First off, awesome article. It really reads like one of the older Tom's articles.

    Second, wow Microsoft is stupid, as is the whole patent system. This is ridiculous.
    Reply
  • tntom
    Wow! I'm going patent "Systems and methods to perform efficient mode of transportation for passengers of a powered vehicle." Maybe I'll just patent the most efficient of everything. That wouldn't be to vague would it?
    Reply
  • JohnnyLucky
    Looks like they are following in IBM's footsteps.
    Reply
  • diablob
    Unpatentable: A*X=B
    Patentable: Puppies x Cuteness = Adorability-Factor

    The more I read about software patents, the more angry I get. When are we going to fix the system and formally pass a law that stops software from being patentable?
    Reply
  • verbalizer
    things will only continue to get worse and worse in the future.
    judgment day as in Terminator is coming and MS will be the source.. lol
    Reply