Endurance, Power, And Materials
Endurance And Thermal Concerns
Manufacturers use error correction techniques, such as BCH and LDPC (Low-Density Parity Check) ECC, to deal with the inevitable errors that flow from any media. Modern low-endurance TLC NAND requires LDPC, which boosts error recovery, but it also incurs significant performance overhead. LDPC can create severe I/O outliers, which are errant data requests that take more time to complete than a normal operation.
LDPC performs well during normal "hard decision" error processing when the errors are easy to correct, but it has a performance impact when the code transitions into "soft decision" mode for difficult-to-recover bits. Soft decision decode re-reads the cell and surrounding areas to determine the contents of the cell, which causes latency to unpredictably skyrocket into the hundreds of microseconds for some operations. The soft-decision error correction mode becomes more prevalent as media wears. This isn't ideal for storage, but it is particularly problematic for DIMM usage models.
BCH ECC doesn't have as much overhead, and 3D XPoint has an inherently lower bit error rate, so Micron uses a proprietary lightweight BCH error correction with QuantX devices. The proprietary ECC codes are the big motivator to use CNEX controllers, which we will cover shortly.
QuantX's relatively uninspiring 25DWPD endurance threshold is the result of tuning the media for high performance, rather than high endurance. Micron tuned its ECC to avoid interfering with the performance of the low latency media. The company could increase ECC capabilities to provide more endurance, but it would come at the cost of performance.
While the blend of performance and endurance makes sense for storage, this appears to throw the suitability of Intel's DIMMs (which we've only seen once in public) into question.
An Optane DIMM would require much more than 25DWPD of endurance, and if dialing up the ECC imposes too much performance overhead in a slower storage medium, it might put memory performance out of reach. Of course, Intel might have other intentions for error correction, but the two companies are using the same media, and Optane DIMMs will have an FPGA to manage errors. There have been many reports that Intel will not bring DIMMs to market as soon as expected, and if those unconfirmed reports are proven true, the bit error rate, endurance, and thermal guidelines are potentially to blame. IMFT expects 3D XPoint endurance to increase with each new generation, whereas NAND endurance declines. It is possible that Intel could rectify any DIMM delay with the next generation of 3D XPoint products.
Both the storage and memory products will have to conform to JEDEC's thermal standards, which isn't a problem on the storage side. However, the DIMM thermal envelopes are challenging. The ability to meet critical temperature thresholds will also be a key requirement for on-package, and maybe even on-die, implementations.
Power Isn't A Big "Adder"
Intel has stated that 3D XPoint can offer 30% lower average power than other solutions, but this is likely due to its speed and measurements with an extended workload. The ability to respond to requests quickly, and then fall into an idle or sleep state almost immediately, will save on power consumption. As with many of Intel's claims, there isn't much hard data to quantify the measurement.
Power consumption is a priority in the data center, which is Micron's intended target, but the company says that 3D XPoint doesn't deliver reduced power consumption because it is easy to "light up" a lot of cells at once. Instead of reduced power consumption, you just receive more performance. Micron expects to have a power envelope similar to NAND-based SSDs, which are already much better than many competing technologies. The increased performance within a similar power envelope will provide much better IOPS-per-Watt efficiency metrics, especially in mixed workloads, which we will explore a bit later.
Proprietary Interconnects Start Small
3D XPoint die are stacked into normal packages that feature BGA mounting, but they are not utilizing the ONFI 4 specification. The ONFI (Open NAND Flash Interface) Workgroup consists of more than 100 member companies that define a standardized interface for NAND. ONFI allows NAND packages to connect through standardized PCB connections and communicate with the SSD controller via a standardized interface. The standardized approach is critical to enhancing industry-wide interoperability for SSD components.
IMFT builds ONFI-compliant NAND, but will not use an ONFI standard for 3D XPoint devices. The workgroup designed the ONFI spec with NAND-based devices in mind, and Micron insists the spec is too latent to use with 3D XPoint. Micron developed an optimized proprietary interconnect for 3D XPoint chips, which it tentatively refers to as the QuantX Media Interface, but we aren't sure if it is a cooperative effort with Intel, or separate. The new media interface is similar to DDR4, and Micron claims it is much faster than ONFI (its speed was at 800MHz at the time of FMS 2016, but that isn't final). In either case, the use of proprietary interconnects at the lowest levels of the design are just the beginning of the proprietary tools that are fueling industry concern.
Materials Science And Sourcing
The world of leading-edge semiconductor development is incredibly reliant upon materials science, which also comes into play heavily in 3D XPoint development. IMFT has stated that 3D XPoint requires 100 new materials, some of which it hasn't used in its manufacturing processes before. Where these materials fit into the equation, or what they are, is a mystery. However, we know that this creates serious supply chain issues. Enterprise OEMs, in particular, are very insistent upon dual sourcing, never wanting exposure to a single link in a chain. The IMFT fabs in Lehi, UT, and Singapore, along with Intel's Dalian, China fab defrays those concerns to some extent, as both companies will have independent, geographically disparate fabs producing the memory, but that does not address all of the supply chain issues.
A good example of these challenges occurred during HGST's transition to helium HDDs. The HDD industry learned several hard lessons from the Thailand floods, which led to an extended industry-wide HDD shortage. To assuage customers, HGST had to build a supply chain of geographically distributed helium suppliers to avoid a catastrophic loss of supply. However, a single, geographically distributed supplier doesn't count as dual-source.
The same principles apply to 3D XPoint production, but instead of one material (helium), IMFT and Intel have to source over 100. It's a fair bet that many of them are of the exotic variety, such as rare earth metals. Constructing the necessary geographically distributed dual sourcing is likely quite the operation, but it will be a requirement for some of the more staunch OEMs. Hyperscale cloud service providers, such as Amazon, Google, and Facebook, aren't as prone to require strict dual sourcing as more traditional OEMs are, but for long-term market viability, IMFT has to build a robust supply chain.
MORE: Data Center M.2 SSD 101