Design and Analysis of Dynamic Redundancy Networks

Academic Article

Abstract

  • Most previous work in the fault tolerant design of multistage interconnection networks (MIN’s) has been based on improving the reliabilities of the networks themselves. For parallel systems containing a large number of processing elements (PE’s), the capability to recover from a PE fault is also important. The dynamic redundancy (DR) network is investigated in this paper. It can tolerate faults in the network and support a system to tolerate PE faults without degradation by adding spare PE’s, while retaining the full capability of a multistage cube network. The DR network can also be controlled by the same routing tags used for the multistage cube. Hence, with a recovery procedure added in the operating system, programs which can be executed in a system based on a multistage cube can be executed in a system based on the proposed network before and after a fault without any modification. A variation of the DR network, the reduced DR network, is also considered, which can be implemented more cost effectively than the DR while retaining most of the advantages of the DR. The reliabilities of DR-based systems with one spare PE and the reliabilities of systems with no spare PE’s are estimated and compared, and the effect of adding multiple spare PE’s is analyzed. It is shown that no matter how much redundancy is added into an MIN, the system reliability cannot exceed a certain bound; however, using the DR and spare PE’s, this bound can be exceeded. © 1988 IEEE
  • Authors

    Published In

    Digital Object Identifier (doi)

    Author List

  • Jeng M; Siegel HJ
  • Start Page

  • 1019
  • End Page

  • 1029
  • Volume

  • 37
  • Issue

  • 9