Samsung Develops Industry’s First Standards-Based ‘Key Value’ SSD Prototype

Samsung announced it had developed the industry’s first prototype “Key Value” (KV) SSD that is compliant with the corresponding new open standard. KV SSDs move today’s storage workload of CPUs into the SSD, leading to much-improved software and hardware efficiency. This paves the way for a new generation of high-performance and scalable storage architectures.

For some background, data is stored in SSDs in fixed-sized blocks (the smallest erasable amount of data in an SSD), each with its own ID. But most real-world data is unstructured, like music, photo and zip files. So SSDs normally convert object data (which can widely vary in size) into data fragments with the size of these blocks. When some data is changed, the whole block is erased and reprogrammed.

To cater to real-world usage models, most real-world data center software (e.g. LevelDB, Amazon DynamoDB) uses key value storage where a variable-sized key refers to a variable-sized collection of data. In programming this is called a key-value tuple. In other words, a key is submitted followed by either putting or receiving the data associated with the key.

However, key value software places quite some burden on the host processor as it uses valuable CPU resources. The industry’s concern was that as the speed of SSDs continued to increase, system-level performance would be saturated as it got bottlenecked by CPU overhead due to the burden of managing blocks and operations. So the idea behind a Key Value SSD, then, is to support key value storage (KVS) natively: each key enables direct addressing of a data location. This eliminates processes such as logical and physical block addressing. Instead of a block device driver, a KV SSD has a KV device driver and it is accessed through a KV library. Through this software and hardware co-design, resources can be moved from the host CPU to the SSD.

While Samsung already made a proprietary KV SSD prototype two years ago (and submitted its proposal for a standard), in April this year the Storage Networking Industry Association (SNIA) released the new open standard Key Value Storage API v1 that serves to provide a vendor-independent Key Value Storage (KVS) programming interface. Samsung on Wednesday announced that it has developed the first SSD prototype that is compliant with the open standard, but did not provide further specifications.

Michael Oros, SNIA Executive Director, sees Key Value SSDs becoming widely used: “The SNIA KV API specification, which provides an industry-wide interface between an application and a Key Value SSD, paves the way for widespread industry adoption of a standardized KV API protocol.”

Samsung claims numerous benefits associated with KV storage technology. Moving storage operations to the SSD itself – in a standardized manner – frees the host CPU from computational work such as block operations and storage-level garbage collection (which is needed with software-based KVS), resulting in greater system-level performance and higher software efficiency. The reduced CPU overhead also provides substantially improved scalability in the number of interlinked SSDs. Write amplification is also said to be much reduced. (Write amplification is a phenomenon whereby more data is written than intended, due to the much larger size of blocks that can be erased compared to the smaller pages that can be written.) Lastly, each SSD experiences less wear, prolonging its lifetime.

In a 2017 presentation, Samsung showed that its KV prototype handled 8x more queries per second (QPS) in a random put benchmark and reduced traffic to device by over 90%. Furthermore, as the number of SSDs increased to 18, the amount of queries per second increased practically linearly, resulting in 15x higher QPS than a standard block SSD. Similar results were achieved with a sequential benchmark and with a scale-out test to multiple clients. In terms of CPU utilization, the regular SSD achieved up to 400k QPS at 80% utilization, compared to the KV implementation that yielded 2.1M QPS at 30% utilization.

Samsung said it is working with several companies to build an ecosystem for the technology. Its KV SSD prototype is also “sufficiently advanced” that it is making it available to companies for application development. Samsung did not say when it expects to commercialize this new technology. AnandTech notes that this won't happen until a key-value extension for NVMe is finalized.

  • digitalgriffin
    admin said:
    Samsung announced it had developed the industry's first Key Value SSD that is compliant with the new open standard for KV SSDs, which can significantly reduce CPU overhead for storage workloads

    Samsung Develops Industry’s First Standards-Based ‘Key Value’ SSD Prototype : Read more

    interesting approach. Assuming they are using conventional Prime number sized key-value maps, they can be effectively used up to 2/3rd of the prime size before double hits occur. (Per studies by Knuth and others) If it's a 64 bit map, the largest prime would be 9,223,372,036,854,775,807 . Multiply that by .66 and you get a reasonable amount of file entries.
    Reply
  • bit_user
    digitalgriffin said:
    Assuming they are using conventional Prime number sized key-value maps, they can be effectively used up to 2/3rd of the prime size before double hits occur. (Per studies by Knuth and others) If it's a 64 bit map, the largest prime would be 9,223,372,036,854,775,807 . Multiply that by .66 and you get a reasonable amount of file entries.
    Huh? The keys would tend to be from a DB schema - probably not mere hashes of the values or whatever. Whether there are collisions and what those actually mean is domain-dependent. The keys could even be sequential, such as ROW IDs, making them densely-packed.

    According to Anandtech:
    Those drives support key lengths from 4 to 255 bytes and value lengths up to 2MB

    So, that's a key space of anywhere from 32 bits to 2040 bits.
    Reply
  • digitalgriffin
    bit_user said:
    Huh? The keys would tend to be from a DB schema - probably not mere hashes of the values or whatever. Whether there are collisions and what those actually mean is domain-dependent. The keys could even be sequential, such as ROW IDs, making them densely-packed.

    According to Anandtech:


    So, that's a key space of anywhere from 32 bits to 2040 bits.

    A variable length key based on a b tree db schema?

    Great now I have something else to learn how it works. :P Paper link?
    Reply
  • bit_user
    digitalgriffin said:
    A variable length key based on a b tree db schema?
    I'm not sure I follow, unless you're just being cute.

    digitalgriffin said:
    Great now I have something else to learn how it works. :p Paper link?
    You mean this?

    https://www.snia.org/tech_activities/standards/curr_standards/kvsapi

    Did you read the NVMe specifications or SATA/SAS, for that matter?

    That said, there's potentially some value in reading about NVMe's features. Some cool stuff in there, actually:

    https://www.anandtech.com/show/14543/nvme-14-specification-published
    Reply
  • digitalgriffin
    Another list of papers to add to my list. Thanks. (And for the record I wasn't being sarcastic) I know how B Trees work for Databases but didn't know if it was a standard B Tree Parse. I thought they might be using like a RLE hash to get a Left Branch Right Branch Right Branch Mid Branch short cut going which would avoid expensive compares. But I didn't know. It was just a guess. I'll be sure to do some reading.
    Reply