Efficient Overlay Architecture Based on DSP Blocks

Design productivity and long compilation times are major issues preventing the mainstream adoption of FPGAs in general purpose computing. Several overlay architectures have emerged to tackle these challenges, but at the cost of increased area and performance overheads. This paper examines a coarse grained overlay architecture designed using the flexible DSP48E1 primitive on Xilinx FPGAs. This allows pipelined execution at significantly higher throughput without adding significant area overheads to the PE. We map several benchmarks, using our custom mapping tool, and show that the proposed overlay architecture delivers a throughput of up to 21.6 GOPS and provides an 11 -- 52% improvement in throughput compared to Vivado HLS implementations.

Discussion(0)

No comments yet. Be the first to comment.

Publication Info

DOI: 10.1109/fccm.2015.15
Year: 2015
Published: —
Language: English

Article Details

Pages: 25-28
Link Of The Paper: https://doi.org/10.1109/fccm.2015.15

Timeline

Created:June 19, 2026

Related publications

Article2016

Adapting the DySER Architecture with DSP Blocks as an Overlay for the Xilinx Zynq

Abhishek Kumar Jain, Xiangwei Li, Suhaib A. Fahmy, Douglas Leslie Maskell

ACM SIGARCH Computer Architecture News

Preprint2016

Efficient Overlay Architecture Based on DSP Blocks

Abstract

Discussion(0)

Related publications

Adapting the DySER Architecture with DSP Blocks as an Overlay for the Xilinx Zynq

An Area-Efficient FPGA Overlay using DSP Block based Time-multiplexed Functional Units

Throughput Oriented FPGA Overlays Using DSP Blocks

DeCO: A DSP Block Based FPGA Accelerator Overlay with Low Overhead Interconnect

Coarse Grained FPGA Overlay for Rapid Just-In-Time Accelerator Compilation