HP Pushes Petabytes
When we asked HP about its server solution for Web 2.0 and how it compared to iDataPlex, the company put us off until the end of May on the server side, but brought up the other side of the Web 2.0 story: storage.
The original definition of Web 2.0 is services that add value to user-generated data, like flickr photo tags and eBay rankings. The tags and rankings don’t take up much space, but the photos and auctions they’re attached to do. HP’s own Snapfish photo service has over 5 billion images, taking up 4.5 petabytes of storage with room for more; by 2010 that’s going to be more like 20 petabytes. But it has just two people looking after the hundreds of disks that make up its storage. Michael Callahan, chief technologist of HP’s StorageWorks NAS division and co-founder of People, says that’s because of the PolyServe software incorporated in HP’s new Extreme Data Storage System, and the simplicity of maintenance. The ExDS9100 will be on sale at the end of this year, but Snapfish is already using a prototype version.
When you think about storage, HP doesn’t always come to mind. But in the last quarter of 2007, HP sold 46% of all the disk drives manufactured worldwide, adding up to 123,000 terabytes of disk space. A lot of those drives were in desktop and notebook PCs, but many businesses, from the largest retail bank in the United States, to a number of other well-known photo sites, already use HP’s PolyServe software to manage their storage. At the moment that means each site has to design and integrate a custom storage system based on PolyServe system running on a Linux server; the ExDS9100 is a rack they can buy, plug in and start using straight away.
The PolyServe clustering software runs on the ExDS9100’s four BL460 blade servers; they’re in a 10U BladeSystem c7000 chassis that can accommodate up to 16 blades, giving you up to 12.8 cores per U and 3.2 GB/s of raw performance.
The difference from any other storage array – or from a network array storage system you’d buy for home – is the scale. The exDS9100 starts at a quarter of a petabyte in three 7U blocks of 82 individual 3.5” SATA 1TB drives. You can add up to seven more 82TB blocks for a total of 820TB of storage in two connected racks; you get four storage blocks in one rack, along with the blades, and six in the second. That’s an average of 12TB per U, which is twice as dense as the 6TB/U Google manages, according to Callahan. And you can link multiple Extreme Data Storage systems together for a multi-petabyte system.
Of the 82 drives in each storage block, 12 are mounted in a section with the redundant RAID controllers and the other 70 are densely packed into two pull-out shelves. The drives are mounted sideways rather than vertically. This doesn’t take up any more space but it means you don’t have to drag a ladder into the data center to replace drives at the top of the unit. Pull out a shelf of 35 drives to replace a faulty drive and the other 69 continue to serve up data while you work.
The PolyServe software means every drive can be accessed directly by every blade and every file can be served up by every blade as well. And you can access the data from other systems via NFS, HTTP and PolyServe’s Direct IO. Link several Extreme Data Storage systems together and you can access all the petabytes of storage directly. Because the operating system running on the blades in the ExDS9100 is Linux, you can run your own applications on it, and if you want to improve the performance of those applications, you can add the extra blades. But you can choose whether you want to add more blades or more storage independently; you don’t need more than the four standard blades to run the full 820TB of storage.
That’s very different from the usual methods for building large storage systems, Callahan says. “A lot of approaches in the past have been a server with direct attached drives, which meant that in order to have a complete switched extreme connection between the server and drives, it had to be fibre channel,” Callahan said.
Callahan says the competition takes the classic Google approach by offering individual servers with direct attached disks, so you get maybe a dozen disks on each server. “That means for 820TB you have dozens of servers that you are powering up and paying for,” Callahan said. “With McKinsey saying that data centers could be responsible for more CO2 than the airline industry by 2020, the notion that it makes sense to throw in another server for every dozen disks is increasingly untenable if there are alternatives.”