Design productivity and long compilation times are major issues preventing the mainstream adoption of FPGAs in general purpose computing. Several overlay architectures have emerged to tackle these challenges, but at the cost of increased area and performance overheads. This paper examines a coarse grained overlay architecture designed using the flexible DSP48E1 primitive on Xilinx FPGAs. This allows pipelined execution at significantly higher throughput without adding significant area overheads to the PE. We map several benchmarks, using our custom mapping tool, and show that the proposed overlay architecture delivers a throughput of up to 21.6 GOPS and provides an 11 -- 52% improvement in throughput compared to Vivado HLS implementations.
Discussion(0)
No comments yet. Be the first to comment.