Characterizing the parallel performance and soft error resilience of probabilistic inference algorithms

Mark Horowitz; Vicky Wong

Abstract

2 min read

Probabilistic reasoning has become a popular approach for modeling systems with uncertainty and solving for the most likely solution based on the data available. It has been successfully applied to many exciting fields and its applications are expanding. However, there has been little work on how they map to modern and future computing systems. Continued scaling of VLSI circuit technology is driving processor design towards explicitly parallel machines as energy constraints and diminishing return of instruction level parallelism limit the performance gain possible with monolithic processors. Using a reconfigurable chip multiprocessor (CMP) architecture as our evaluation platform, we characterize the performance of a set of probabilistic inference algorithms. Our results show that probabilistic inference applications have plenty of data parallelism that is easily extractable in most cases. For some applications, hardware supported multicontext processors and fine-grain synchronization are necessary to achieve efficient parallel execution. Unlike parallel scientific benchmarks, these applications have lower compute to memory ratio as well as large working sets. Thus, high memory bandwidth is required and using multi-context processors to hide some of the memory latency is desired. Besides the shift to CMP's, technology scaling is also fueling a growing concern on the reliability of future processor chips, as shrinking feature size and lower voltage make devices more susceptible to upsets from transient errors. Since probabilistic reasoning is designed to handle noisy or incomplete data, it raises the question of whether they are more robust than traditional programs, and if that warrants a different approach to soft error protection. Our fault injection experiments confirm that the robustness of approximate inference algorithms makes them more resilient against transient errors compared to traditional benchmarks. In addition, the approximate nature of the computation enables low-cost fault recovery. With simple modifications in the software, we can further improve the percentage of soft errors masked. The errors that the algorithm cannot naturally recover from often point to critical sections of the program and we find that algorithm specific software level error protection can be very effective in increasing robustness while incurring little additional overhead.

Characterizing the parallel performance and soft error resilience of probabilistic inference algorithms

Abstract

Discussion(0)

Related publications

Robust Class Parallelism - Error Resilient Parallel Inference with Low Communication Cost

Polymorphous Computing Architectures

High performance, reliable and flexible computing payload for space missions

Efficient superscalar performance through boosting

Efficient superscalar performance through boosting

Related publications

Article2020
Robust Class Parallelism - Error Resilient Parallel Inference with Low Communication Cost
Article2020

Report2007
Polymorphous Computing Architectures
Report2007

Article2004
High performance, reliable and flexible computing payload for space missions
Article2004

Article1992
Efficient superscalar performance through boosting
Article1992

Article1992
Efficient superscalar performance through boosting
Article1992