Compiling Halide Programs to Push-Memory Accelerators

Image processing and machine learning applications benefit tremendously from hardware acceleration, but existing compilers target either FPGAs, which sacrifice power and performance for flexible hardware, or ASICs, which rapidly become obsolete as applications change. Programmable domain-specific accelerators have emerged as a promising middle-ground between these two extremes, but such architectures have traditionally been difficult compiler targets. The main obstacle is that these accelerators often use a different memory abstraction than CPUs and GPUs: push memories that send a data stream from one computation kernel to other kernels, possibly reordered. To address the compilation challenges caused by push memories, we propose that the representation of memory in the middle and backend of the compiler be altered to combine storage with address generation and control logic in a single structure -- a unified buffer. We show that this compiler abstraction can be implemented efficiently on a programmable accelerator, and design a memory mapping algorithm that combines polyhedral analysis and software vectorization techniques to target our accelerator. Our evaluation shows that the compiler supports programmability while maintaining high performance. It can compile a wide range of image processing and machine learning applications to our accelerator with 4.7x better runtime and 4.3x better energy-efficiency as compared to an FPGA.

Discussion(0)

No comments yet. Be the first to comment.

Open reviews(0)

Public, signed peer feedback on this preprint.

No reviews yet.

Publication Info

DOI: 10.48550/arxiv.2105.12858
Year: 2021
Published: —
Language: English

Preprint Details

Journal Name: arXiv (Cornell University)
Link Of The Paper: https://doi.org/10.48550/arxiv.2105.12858

Timeline

Created:June 19, 2026

Related publications

Preprint2021

Compiling Halide Programs to Push-Memory Accelerators

Qiaoyi Liu, Dillon Huff, Jeff Setter, Maxwell Strange, Kathleen Feng, Kavya Sreedhar, Ziheng Wang, Keyi Zhang, Mark Horowitz, Priyanka Raina, Fredrik Kjølstad

arXiv (Cornell University)

Preprint2021

Compiling Halide Programs to Push-Memory Accelerators

Qiaoyi Liu, Dillon Huff, Jeff Setter, Maxwell Strange, Kathleen Feng, Kavya Sreedhar, Ziheng Wang, Keyi Zhang, Mark Horowitz, Priyanka Raina, Fredrik Kjølstad

Article2022

Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators

Qiaoyi Liu, Jeff Setter, Dillon Huff, Maxwell Strange, Kathleen Feng, Mark Horowitz, Priyanka Raina, Fredrik Kjølstad

ACM Transactions on Architecture and Code Optimization

Preprint2018

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, Mark Horowitz

arXiv (Cornell University)

Article2023

APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis

Jackson Melchert, Kathleen Feng, Caleb Donovick, Ross Daly, Ritvik Sharma, Clark Barrett, Mark Horowitz, Pat Hanrahan, Priyanka Raina