论文标题

位效果ECC恢复(啤酒):通过利用DRAM数据保留特性来确定DRAM ON-DIE ECC功能

Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics

论文作者

Patel, Minesh, Kim, Jeremie S., Shahroodi, Taha, Hassan, Hasan, Mutlu, Onur

论文摘要

增加的单细胞DRAM错误率促使DRAM制造商采用了对DIE错误校正编码(ECC),该编码完全在DRAM芯片范围内运行以提高工厂的产量。 ON-DIE ECC功能及其对DRAM可靠性的影响被认为是商业秘密,因此只有制造商确切地知道ON-DIE ECC如何改变外部可见的可靠性特征。因此,ON-DIE ECC阻碍了第三方DRAM客户(例如测试工程师,实验研究人员),他们通常根据这些特征设计,测试和验证系统。 为了使第三方深入了解错误校正期间的持续发展如何改变DRAM错误模式,我们引入了位效果ECC恢复(BEER),这是一种确定完整的DRAM ON-DIE ECC功能(即,其Parity-Check Matrix)没有硬件工具,无需硬件工具,访问DRAM CHIP ERCC或ON-DIE MAGIN IMCISM,或综合症,奇偶校验信息)。啤酒利用了关键见解,即通过精心制作的测试模式非侵入性诱导数据驱动错误揭示了特定ECC功能所独有的行为。 我们使用啤酒来识别来自三个主要DRAM制造商的DIE ECC的80个真实LPDDR4 DRAM芯片的ECC功能。我们评估Beer在真实系统上的模拟和性能方面的正确性,以表明啤酒在广泛的ON-DIE ECC功能中具有有效和实用性。为了证明啤酒的价值,我们提出并讨论了第三方可以使用啤酒改善其设计和测试实践的几种方法。作为一个具体的示例,我们介绍和评估哔哔声,这是使用已知的On-DIE ECC函数的第一个错误分析方法来恢复负责可观察到的后校正错误的不可观察的原始位错误的数字和位置位置。

Increasing single-cell DRAM error rates have pushed DRAM manufacturers to adopt on-die error-correction coding (ECC), which operates entirely within a DRAM chip to improve factory yield. The on-die ECC function and its effects on DRAM reliability are considered trade secrets, so only the manufacturer knows precisely how on-die ECC alters the externally-visible reliability characteristics. Consequently, on-die ECC obstructs third-party DRAM customers (e.g., test engineers, experimental researchers), who typically design, test, and validate systems based on these characteristics. To give third parties insight into precisely how on-die ECC transforms DRAM error patterns during error correction, we introduce Bit-Exact ECC Recovery (BEER), a new methodology for determining the full DRAM on-die ECC function (i.e., its parity-check matrix) without hardware tools, prerequisite knowledge about the DRAM chip or on-die ECC mechanism, or access to ECC metadata (e.g., error syndromes, parity information). BEER exploits the key insight that non-intrusively inducing data-retention errors with carefully-crafted test patterns reveals behavior that is unique to a specific ECC function. We use BEER to identify the ECC functions of 80 real LPDDR4 DRAM chips with on-die ECC from three major DRAM manufacturers. We evaluate BEER's correctness in simulation and performance on a real system to show that BEER is effective and practical across a wide range of on-die ECC functions. To demonstrate BEER's value, we propose and discuss several ways that third parties can use BEER to improve their design and testing practices. As a concrete example, we introduce and evaluate BEEP, the first error profiling methodology that uses the known on-die ECC function to recover the number and bit-exact locations of unobservable raw bit errors responsible for observable post-correction errors.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源