This paper investigates the time lost in a parallel computation due to sequential and duplicated work, communication, and blocking, and proposes characterizations of parallel algorithms based upon the communication complexity and the blocking model. It discusses the impact of the processor′s architecture upon the measured speedup. It shows that a large speedup may be due to an inefficient sequential computation, rather than to an efficient parallel computation. A model of parallel computation which takes into account sequential and duplicated work, communication and control, and blocking is presented. It parametrizes scalability using three functions of the number P of processors: E(P) = the number of communication events, ƒ(P) = the fraction of sequential and duplicated work plus the algorithmic blocking, I(P) = the instruction execution rate. The characteristic function E(P) is the most important as in many computations ƒ(P) and I(P) are nearly constant. The model is used to predict the asymptotic behavior, the maximum speedup, and the optimal number of processors. A 3-D FFT algorithm and a Chebyshev iterative algorithm are used to illustrate the concepts introduced.
Discussion(0)
No comments yet. Be the first to comment.