Skip to main content

3D XPoint: A Guide To The Future Of Storage-Class Memory

Endurance, Power, And Materials

Endurance And Thermal Concerns

Manufacturers use error correction techniques, such as BCH and LDPC (Low-Density Parity Check) ECC, to deal with the inevitable errors that flow from any media. Modern low-endurance TLC NAND requires LDPC, which boosts error recovery, but it also incurs significant performance overhead. LDPC can create severe I/O outliers, which are errant data requests that take more time to complete than a normal operation.  

LDPC performs well during normal "hard decision" error processing when the errors are easy to correct, but it has a performance impact when the code transitions into "soft decision" mode for difficult-to-recover bits. Soft decision decode re-reads the cell and surrounding areas to determine the contents of the cell, which causes latency to unpredictably skyrocket into the hundreds of microseconds for some operations. The soft-decision error correction mode becomes more prevalent as media wears. This isn't ideal for storage, but it is particularly problematic for DIMM usage models.

BCH ECC doesn't have as much overhead, and 3D XPoint has an inherently lower bit error rate, so Micron uses a proprietary lightweight BCH error correction with QuantX devices. The proprietary ECC codes are the big motivator to use CNEX controllers, which we will cover shortly.

QuantX's relatively uninspiring 25DWPD endurance threshold is the result of tuning the media for high performance, rather than high endurance. Micron tuned its ECC to avoid interfering with the performance of the low latency media. The company could increase ECC capabilities to provide more endurance, but it would come at the cost of performance.

While the blend of performance and endurance makes sense for storage, this appears to throw the suitability of Intel's DIMMs (which we've only seen once in public) into question.

An Optane DIMM would require much more than 25DWPD of endurance, and if dialing up the ECC imposes too much performance overhead in a slower storage medium, it might put memory performance out of reach. Of course, Intel might have other intentions for error correction, but the two companies are using the same media, and Optane DIMMs will have an FPGA to manage errors. There have been many reports that Intel will not bring DIMMs to market as soon as expected, and if those unconfirmed reports are proven true, the bit error rate, endurance, and thermal guidelines are potentially to blame. IMFT expects 3D XPoint endurance to increase with each new generation, whereas NAND endurance declines. It is possible that Intel could rectify any DIMM delay with the next generation of 3D XPoint products.

Both the storage and memory products will have to conform to JEDEC's thermal standards, which isn't a problem on the storage side. However, the DIMM thermal envelopes are challenging. The ability to meet critical temperature thresholds will also be a key requirement for on-package, and maybe even on-die, implementations.

Power Isn't A Big "Adder"

Intel has stated that 3D XPoint can offer 30% lower average power than other solutions, but this is likely due to its speed and measurements with an extended workload. The ability to respond to requests quickly, and then fall into an idle or sleep state almost immediately, will save on power consumption. As with many of Intel's claims, there isn't much hard data to quantify the measurement.

Power consumption is a priority in the data center, which is Micron's intended target, but the company says that 3D XPoint doesn't deliver reduced power consumption because it is easy to "light up" a lot of cells at once. Instead of reduced power consumption, you just receive more performance. Micron expects to have a power envelope similar to NAND-based SSDs, which are already much better than many competing technologies. The increased performance within a similar power envelope will provide much better IOPS-per-Watt efficiency metrics, especially in mixed workloads, which we will explore a bit later.

Proprietary Interconnects Start Small

3D XPoint die are stacked into normal packages that feature BGA mounting, but they are not utilizing the ONFI 4 specification. The ONFI (Open NAND Flash Interface) Workgroup consists of more than 100 member companies that define a standardized interface for NAND. ONFI allows NAND packages to connect through standardized PCB connections and communicate with the SSD controller via a standardized interface. The standardized approach is critical to enhancing industry-wide interoperability for SSD components.

IMFT builds ONFI-compliant NAND, but will not use an ONFI standard for 3D XPoint devices. The workgroup designed the ONFI spec with NAND-based devices in mind, and Micron insists the spec is too latent to use with 3D XPoint. Micron developed an optimized proprietary interconnect for 3D XPoint chips, which it tentatively refers to as the QuantX Media Interface, but we aren't sure if it is a cooperative effort with Intel, or separate. The new media interface is similar to DDR4, and Micron claims it is much faster than ONFI (its speed was at 800MHz at the time of FMS 2016, but that isn't final). In either case, the use of proprietary interconnects at the lowest levels of the design are just the beginning of the proprietary tools that are fueling industry concern.

Materials Science And Sourcing

The world of leading-edge semiconductor development is incredibly reliant upon materials science, which also comes into play heavily in 3D XPoint development. IMFT has stated that 3D XPoint requires 100 new materials, some of which it hasn't used in its manufacturing processes before. Where these materials fit into the equation, or what they are, is a mystery. However, we know that this creates serious supply chain issues. Enterprise OEMs, in particular, are very insistent upon dual sourcing, never wanting exposure to a single link in a chain. The IMFT fabs in Lehi, UT, and Singapore, along with Intel's Dalian, China fab defrays those concerns to some extent, as both companies will have independent, geographically disparate fabs producing the memory, but that does not address all of the supply chain issues.

A good example of these challenges occurred during HGST's transition to helium HDDs. The HDD industry learned several hard lessons from the Thailand floods, which led to an extended industry-wide HDD shortage. To assuage customers, HGST had to build a supply chain of geographically distributed helium suppliers to avoid a catastrophic loss of supply. However, a single, geographically distributed supplier doesn't count as dual-source.

The same principles apply to 3D XPoint production, but instead of one material (helium), IMFT and Intel have to source over 100. It's a fair bet that many of them are of the exotic variety, such as rare earth metals. Constructing the necessary geographically distributed dual sourcing is likely quite the operation, but it will be a requirement for some of the more staunch OEMs. Hyperscale cloud service providers, such as Amazon, Google, and Facebook, aren't as prone to require strict dual sourcing as more traditional OEMs are, but for long-term market viability, IMFT has to build a robust supply chain.


MORE: Data Center M.2 SSD 101


MORE: SMR (Shingled Magnetic Recording) 101

  • coolitic
    the 3 months for data center and 1 year for consumer nand is an old statistic, and even then it's supposed to apply to drives that have surpassed their endurance rating.
    Reply
  • PaulAlcorn
    double post

    Reply
  • PaulAlcorn
    18916331 said:
    the 3 months for data center and 1 year for consumer nand is an old statistic, and even then it's supposed to apply to drives that have surpassed their endurance rating.

    Yes, that is data retention after the endurance rating is expired, and it is also contingent upon the temperature that the SSD was used at, and the temp during the power-off storage window (40C enterprise, 30C Client). These are the basic rules by which retention is measured (the definition of SSD data retention, as it were), but admittedly, most readers will not know the nitty gritty details.

    However, I was unaware that JEDEC specification for data retention has changed, do you have a source for the new JEDEC specification?

    Reply
  • stairmand
    Replacing RAM with a permanent storage would simply revolutionise computing. No more loading an OS, no more booting, no loading data, instant searches of your entire PC for any type of data, no paging. Could easily be the biggest advance in 30 years.
    Reply
  • InvalidError
    18917236 said:
    Replacing RAM with a permanent storage would simply revolutionise computing. No more loading an OS, no more booting, no loading data
    You don't need X-point to do that: since Windows 95 and ATX, you can simply put your PC in Standby. I haven't had to reboot my PC more often than every couple of months for updates in ~20 years.
    Reply
  • Kewlx25
    18918642 said:
    18917236 said:
    Replacing RAM with a permanent storage would simply revolutionise computing. No more loading an OS, no more booting, no loading data
    You don't need X-point to do that: since Windows 95 and ATX, you can simply put your PC in Standby. I haven't had to reboot my PC more often than every couple of months for updates in ~20 years.

    Remove your harddrive and let me know how that goes. The notion of "loading" is a concept of reading from your HD into your memory and initializing a program. So goodbye to all forms of "loading".
    Reply
  • hannibal
    The Main thing with this technology is that we can not afford it, untill Many years has passesd from the time it comes to market. But, yes, interesting product that can change Many things.
    Reply
  • _TheD0ct0r_
    10 years later... still unavailable/costs 10k
    Reply
  • TerryLaze
    18922543 said:
    10 years later... still unavailable/costs 10k
    Sure you won't be able to afford a 3Tb+ drive in even 10 years,but a 128/256Gb one just for windows and a few games will be affordable if expensive even in a couple of years.
    Reply
  • zodiacfml
    I dont understand the need to make it work as DRAM replaement. It doesnt have to. A system might only need a small amount RAm then a large 3D xpoint pool.

    The bottleneck is thr interface. There is no faster interface available except DIMM. We use the DIMM interface but make it appear as storage to the OS. Simple.

    It will require a new chipset and board though where Intel has the control. We should see two DIMM groups next to each other, they differ mechanically but the same pin count.
    Reply