Abstract
2 min readProbabilistic reasoning has become a popular approach for modeling systems with uncertainty and solving for the most likely solution based on the data available. It has been successfully applied to many exciting fields and its applications are expanding. However, there has been little work on how they map to modern and future computing systems. Continued scaling of VLSI circuit technology is driving processor design towards explicitly parallel machines as energy constraints and diminishing return of instruction level parallelism limit the performance gain possible with monolithic processors. Using a reconfigurable chip multiprocessor (CMP) architecture as our evaluation platform, we characterize the performance of a set of probabilistic inference algorithms.
Our results show that probabilistic inference applications have plenty of data parallelism that is easily extractable in most cases. For some applications, hardware supported multicontext processors and fine-grain synchronization are necessary to achieve efficient parallel execution. Unlike parallel scientific benchmarks, these applications have lower compute to memory ratio as well as large working sets. Thus, high memory bandwidth is required and using multi-context processors to hide some of the memory latency is desired.
Besides the shift to CMP's, technology scaling is also fueling a growing concern on the reliability of future processor chips, as shrinking feature size and lower voltage make devices more susceptible to upsets from transient errors. Since probabilistic reasoning is designed to handle noisy or incomplete data, it raises the question of whether they are more robust than traditional programs, and if that warrants a different approach to soft error protection.
Our fault injection experiments confirm that the robustness of approximate inference algorithms makes them more resilient against transient errors compared to traditional benchmarks. In addition, the approximate nature of the computation enables low-cost fault recovery. With simple modifications in the software, we can further improve the percentage of soft errors masked. The errors that the algorithm cannot naturally recover from often point to critical sections of the program and we find that algorithm specific software level error protection can be very effective in increasing robustness while incurring little additional overhead.
Discussion(0)
No comments yet. Be the first to comment.