«Center for TECHNICAL Reliable REPORT Computing Diversity Techniques for Concurrent Error Detection Subhasish Mitra 00-7 Center for Reliable Computing ...»
Center for TECHNICAL
Diversity Techniques for Concurrent Error Detection
00-7 Center for Reliable Computing
Gates Building 2A, Room 236
Computer Systems Laboratory
June 2000 Dept. of Electrical Engineering and Computer Science
Stanford, California 94305
This technical report contains the text of Subhasish Mitra’s PhD thesis “Diversity Techniques for Concurrent Error Detection.”
This research was supported by the Advanced Research Projects Agency under Contract No. DABT63-97-C-0024.
Copyright © 2000 by Subhasish Mitra.
All rights reserved, including the right to reproduce this report, or portions thereof, in any form.
DIVERSITY TECHNIQUES FOR
CONCURRENT ERROR DETECTION
SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Edward J. McCluskey (Principal Advisor) I certify that I have read this dissertation and that in my opinion it is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy.
Giovanni De Micheli I certify that I have read this dissertation and that in my opinion it is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy.
Bernard Widrow Approved for the University Committee on Graduate Studies _________________________________
Concurrent error detection (CED) techniques are widely used to ensure data integrity in digital systems. Data integrity guarantees that the system outputs are either correct or an error is indicated when incorrect outputs are produced. This dissertation presents the results of theoretical and simulation studies of various CED techniques. The CED schemes studied are based on diverse duplication, simple duplication of identical implementations, and error-detection techniques like parity checking. The study aimed at (1) a quantitative comparison of the effectiveness of different CED schemes, and (2) developing design techniques for efficient concurrent error detection.
A CED scheme based on diverse duplication compares the outputs of two different implementations of the same function and indicates an error when a mismatch occurs. The idea of such a CED technique is derived from the general concept of design diversity. The conventional notion of design diversity is qualitative and relies on independent generation of different implementations. In this dissertation, a metric to quantify design diversity is presented and used for analyzing CED schemes based on diverse duplication.
A comparative study of different CED schemes by means of simulation experiments and theoretical analysis concludes that, in the worst-case, diverse duplication provides significantly better data integrity against multiple failures compared to other CED schemes. This result is especially significant in the context of Common-Mode Failures (CMFs). CMFs undermine the data integrity of any system with CED and belong to a special class of multiple failures whose probability of occurrence can be as high as that of single failures.
New techniques and synthesis algorithms have been developed for the first time to efficiently design systems based on diverse duplication. New fault models for CMFs are proposed and the possible failure mechanisms for the modeled CMFs are analyzed. In addition, techniques for designing CED-based systems with guaranteed data integrity in the presence of modeled CMFs are described.
I am deeply grateful to my advisor, Prof. Edward J. McCluskey, for his constant guidance, support, and encouragement throughout my years at Stanford. He modeled the qualities of an ideal teacher and an outstanding researcher that I aspire to emulate in my career. I would also like to thank Mrs. Lois Thornhill McCluskey.
I would like to thank Prof. Bruce Wooley, my associate advisor, Prof. Bernard Widrow, my committee chairman, for reading my dissertation, and Prof. Giovanni DeMicheli for many interesting discussions and also for agreeing to be the fourth member of my committee.
I would like to thank my colleagues (RATs) at the Center for Reliable Computing (CRC) for helpful discussions, and companionship through the years: Mehmet Apaydin, LaNae Avra, Jonathan Chang, Santiago Fernandez-Gomez, Vlad Friedman, Robert Huang, Rajesh Kamath, Chat Khunpitiluck, Moon-jung Kim, Wern-Yan Koe, James Li, Siyad Ma, Samy Makar, Rob Norwood, Nahmsuk Oh, Nirmal Saxena, Philip Shirvani, Rudy Tan, Nur Touba, Chao-wen Tseng, Sanjay Wattal, Yoonjin Yoon and Catherine Yu. Special thanks to Dr. Nirmal Saxena for his advice and guidance. I also thank Siegrid Munda for her administrative support. Thanks to my professors at Stanford and the CRC visitors Prof. Jacob Abraham, Prof. Bella Bose, Prof. Miroslaw Malek, Prof.
Mohammed Niamat and Prof. Dhiraj Pradhan for many interesting discussions.
I wish to thank my professors (especially, Prof. P. Bhattacharya, Prof. T. K. Dey, Prof. A. K. Majumdar, Prof. P. Pal Chaudhuri, Prof. D. Sarkar of Indian Institute of Technology, Kharagpur and Prof. D. Ghosh Dastidar and Prof. R. Dattagupta of Jadavpur University, Calcutta) and teachers in India. I would also wish to thank my friends for their support over the years that ranged from useful advice to frequent invitations over the weekends. Thanks to Prof. P. Banerjee and Dr. R. K. Roy for their support during the application process for my graduate studies in the United States.
I am very grateful to my parents for their continued love, support and encouragement. I dedicate this thesis to them. I also thank my relatives and familyfriends in India for their continuing encouragement and support.
My research work at Stanford was supported by Defense Advanced Research Projects Agency (DARPA) under Contract No. DABT63-94-C-0045 and DABT63-97-C
Table of Contents
List of Figures
List of Tables
Chapter 1: Introduction
1.2 A Brief Background on Concurrent Error Detection
1.3 Concurrent Error detection and Diversity
Chapter 2: A Design Diversity Metric and Analysis of Redundant Systems............ 11
2.1 D: A Design Diversity Metric
2.2 Reliability Analysis
2.3 Results Demonstrating the Effectiveness of Diversity
Chapter 3: Comparison of Various Concurrent Error Detection Techniques......... 19
3.1 Concurrent Error Detection
3.2 An Overview of Various Concurrent Error Detection Techniques
3.2.1 Concurrent Error Detection using Duplex Systems
3.2.2 Concurrent Error Detection through Parity Prediction
3.2.3 Concurrent Error Detection using Unidirectional Error Detecting Codes...... 22
3.3 Vulnerability to Multiple Failures and CMFs: Simulation Results
Chapter 4: Self-Testability of Duplex Systems
4.2 Identification of Non-Self-Testable Fault Pairs
4.3 Self-Testability Enhancement Using Test Points
4.3.1 Control Test Points
4.3.2 Observation Test Points
4.4 Simulation Results
Chapter 5: Combinational Logic Synthesis Techniques for Diversity
5.1 Problem Formulation
5.2 Two-level Logic Synthesis
5.3 Multi-level Logic Synthesis
5.3.1 Single-Cube Extraction
5.3.2 Double-Cube Extraction
Chapter 6: Common-Mode Failure Models and Redundant System Design............ 44
6.1 Common-Mode Fault Models
6.2 Redundant System Design
6.2.1 Redundant Systems Protected Against IR-CMF-1
6.2.2 Redundant Systems Protected Against IR-CMF-2
Chapter 7: Concluding Remarks
Publications from this Dissertation
Appendix A: Common-Mode Failures in Redundant VLSI Systems: A Survey..........
(To appear in the IEEE Transactions on Reliability, 2000) Appendix B: A Design Diversity Metric and Analysis of Redundant Systems.............
(Technical Report, CRC-TR-99-4, Center for Reliable Computing, Stanford University:
http://crc.stanford.edu. An extended version of "A Design Diversity Metric and Reliability Analysis of Redundant Systems," Proceedings of International Test Conference, pp. 662-671, 1999) ix Appendix C: Which Concurrent Error Detection Scheme to Choose ?
(To appear in the Proceedings of International Test Conference, 2000) Appendix D: Fault Escapes In Duplex Systems (Technical Report, CRC TR-00-1, Center for Reliable Computing, Stanford University:
http://crc.stanford.edu. An extended version of "Fault Escapes in Duplex Systems," Proceedings of VLSI Test Symposium, pp. 453-458, 2000) Appendix E: Combinational Logic Synthesis For Diversity In Duplex Systems (To appear in the Proceedings of International Test Conference, 2000) Appendix F: Word-Voter: A New Voter Design For Triple Modular Redundant Systems (An extended version of "Word-Voter: A New Voter Design for Triple Modular Redundant Systems," Proceedings of VLSI Test Symposium, pp. 465-470, 2000) Appendix G: Design of Redundant Systems Protected Against Common-Mode Failures (Technical Report, CRC-TR-00-2, Center for Reliable Computing, Stanford University, 2000 http://crc.stanford.edu) x
LIST OF FIGURESFigure 1.1.
General architecture of a concurrent error detection scheme
A Duplex System for Concurrent Error Detection
Example of diversity
A discrete time model of the system
Common-Mode failure affecting both modules of a duplex system............. 14 Figure 2.4. Data integrity of a duplex system against common-mode failures............... 15 Figure 2.5. Effect of diversity vs. time (for common-mode failures)
General architecture of concurrent error detection………………………….20 Figure 3.2. A Duplex System
A Concurrent Error Detection technique using a single parity bit................ 21 Figure 3.4. Multiple parity bits for concurrent error detection
Concurrent Error Detection Using Berger Codes
Concurrent error detection using Bose-Lin codes
Venn diagram showing yi,j and zi,j
Systems with CED
Control Test Points
Applications with testing phases
Illustration of Single-Cube extraction
Illustration of double-cube extraction
TMR implementation protected against IR-CMF-1
The basic scheme for the second module in a duplex system
xi LIST OF TABLES
Comparison of area overhead of different CED schemes
Detection probability of erroneous outputs for different CED schemes........ 26 Table 3.3. Improvement of detection probability of incorrect outputs using diverse duplication over other schemes
Self-testing properties of duplex systems
Comparison of control and observation test points
Test points for 100% self-testability
Execution time using different techniques on Sun Ultra-Sparc-2.................. 36 Table 6.1 Truth tables (a) Network N (b) Network N1 (c) Network N2
An example logic function
Specification of N2