Energy-Efficient Floating-Point Unit Design

Sameh Galal; Mark Horowitz

doi:10.1109/tc.2010.121

Back

Shared by

Mark Horowitz

Stanford University

Energy-Efficient Floating-Point Unit Design

IEEE Transactions on Computers 60(7): 913-922

Article 2010 English

Authors

SG
Sameh Galal
Mark Horowitz
Stanford University

Abstract

1 min read

Energy-efficient computation is critical if we are going to continue to scale performance in power-limited systems. For floating-point applications that have large amounts of data parallelism, one should optimize the throughput/mm 2 given a power density constraint. We present a method for creating a trade-off curve that can be used to estimate the maximum floating-point performance given a set of area and power constraints. Looking at FP multiply-add units and ignoring register and memory overheads, we find that in a 90 nm CMOS technology at 1 W/mm 2 , one can achieve a performance of 27 GFlops/mm 2 single precision, and 7.5 GFlops/mm double precision. Adding register file overheads reduces the throughput by less than 50 percent if the compute intensity is high. Since the energy of the basic gates is no longer scaling rapidly, to maintain constant power density with scaling requires moving the overall FP architecture to a lower energy/performance point. A 1 W/mm 2 design at 90 nm is a "high-energy" design, so scaling it to a lower energy design in 45 nm still yields a 7× performance gain, while a more balanced 0.1 W/mm 2 design only speeds up by 3.5× when scaled to 45 nm. Performance scaling below 45 nm rapidly decreases, with a projected improvement of only ~3x for both power densities when scaling to a 22 nm technology.

Discussion(0)

Sign in to like and join the discussion.

No comments yet. Be the first to comment.

Related publications

Article2024

Low-Loss and Weakly Coupled Eight-Mode Nodeless Hollow-Core Anti-Resonant Fiber With Three-Layer Nested Tubes in Each Cladding Unit

Wei Gao, Paul Kim Ho Chu

Article2006

Quickest Detection of a Minimum of Disorder Times

Erhan Bayraktar, H Vincent Vincent Poort

Article2023

Vertex Prime Degree-Based Nonisomorphic Topology Automatic Search Algorithm for DC–DC Converters With Two Switches

Hong Li, Ya‐Min Li, Chengdong Yin, Wencai Wang, Yangbin Zeng, Bo Zhang, Guanrong Chen

IEEE Journal of Emerging and Selected Topics in Industrial Electronics

Article2013

Why tunneling FETs don't work, and how to fix it

Sapan Agarwal, Eli Yablonovitch

Article2010

Electrochemical behaviors of composite electrode of TiO<inf>2</inf> nanotube arrays and carbon nanoparticles

Rongsheng Chen, Liangsheng Hu, Kaifu Huo, Paul Kim Ho Chu

Energy-Efficient Floating-Point Unit Design

Abstract

Discussion(0)

Related publications

Low-Loss and Weakly Coupled Eight-Mode Nodeless Hollow-Core Anti-Resonant Fiber With Three-Layer Nested Tubes in Each Cladding Unit

Quickest Detection of a Minimum of Disorder Times

Vertex Prime Degree-Based Nonisomorphic Topology Automatic Search Algorithm for DC–DC Converters With Two Switches

Why tunneling FETs don't work, and how to fix it

Electrochemical behaviors of composite electrode of TiO&lt;inf&gt;2&lt;/inf&gt; nanotube arrays and carbon nanoparticles

Electrochemical behaviors of composite electrode of TiO<inf>2</inf> nanotube arrays and carbon nanoparticles