OpenStack and Kinetic: Next-Generation Datacenter Storage Starts Now
OpenStack and Kinetic: Next-Generation Datacenter Storage Starts Now
What's this
For over five decades, computer systems have relied on hierarchical file systems and block storage for storing data. This approach has and continues to work well for structured data: information that fits neatly into a proscribed field within a record, as in a relational database. But that’s not where most of the growth in today’s data boom comes from. In the 21st century, most data is unstructured, varying in size, format, location, and so on. Everything from emails to wikis to streaming videos to server logs is unstructured. File systems work well enough for unstructured data, but they're not optimal. More to the point, they can be inefficient.
Cisco mobile data forecast through 2017, from the Cisco Visual Networking Index.
This inefficiency started to become glaring and tangible with the rise of cloud computing. On a desktop or even a single server, the metadata required to make file systems operate seems slight. At the cluster level and higher, though, this metadata becomes very cumbersome. Imagine a bottled water delivery truck on the highway, filled with jugs that are all the same size, as per the organization's specifications. But the truck used for hauling water wouldn't be the best choice for moving one's home, much as a moving van doesn't fit well with running water deliveries around town. Similarly, consider a mass scale warehouse such as Amazon's compared to the more traditional warehouse, where all inventory might be organized with a strict, regimented method akin to the Dewey Decimal System. Not to overgeneralize, but Amazon places items wherever they best fit for optimal efficiency, and the company's modern technology lets items be found even faster than knowing that such-and-such a product is located "over in that section."
Part of this has to do with organization through frequency of access. When the most-sought products are placed closer to the door, fewer man-hours are spent moving things. Bringing this back to storage, Seagate found in one cluster test that, according to Senior Director, Advanced Storage Ali Fenn, “92% of the I/O operations moved only .5% of the data. So, effectively, there was a ton of metadata that was causing the arm to move all around and thrash the drive.”
Even if this represents an extreme example, there’s no getting around the fact that metadata creates overhead, and overhead is the enemy of massive scaling. Who needs massive scaling? Cloud providers. Whether that cloud is public or private, conventional file system and block storage methods incur penalties that, until recently, were unfortunate but unavoidable. Over the past three years, though, the open source community and a relative handful of companies have stepped up to offer a better path in scalable cloud storage.