Sparse matrix-vector multiply on the HICAMP architecture

Sparse matrix-vector multiply (SpMV) is a critical task in the inner loop of modern iterative linear system solvers and exhibits very little data reuse. This low reuse means that its performance is bounded by main-memory bandwidth. Moreover, the random patterns of indirection make it difficult to achieve this bound. We present sparse matrix storage formats based on deduplicated memory. These formats reduce memory traffic during SpMV and thus show significantly improved performance bounds: 90x better in the best case. Additionally, we introduce a matrix format that inherently exploits any amount of matrix symmetry and is at the same time fully compatible with non-symmetric matrix code. Because of this, our method can concurrently operate on a symmetric matrix without complicated work partitioning schemes and without any thread synchronization or locking. This approach takes advantage of growing processor caches, but incurs an instruction count overhead. It is feasible to overcome this issue by using specialized hardware as shown by the recently proposed Hierarchical Immutable Content-Addressable Memory Processor, or HICAMP architecture.

Discussion(0)

No comments yet. Be the first to comment.

Publication Info

DOI: 10.1145/2304576.2304603
Year: 2012
Published: —
Language: English

Article Details

Pages: 195-204
Link Of The Paper: https://doi.org/10.1145/2304576.2304603

Timeline

Created:June 19, 2026

Related publications

Article2007

Pulse-Width-Modulation of Neutral-Point-Clamped Sparse Matrix Converter

Article1992

The Stanford Dash multiprocessor

Daniel Lenoski, James Laudon, Kourosh Gharachorloo, W.-D. Weber, Aman Gupta, John L. Hennessy, Mark Horowitz, Monica S. Lam

Computer

Article1988

Cache performance of operating system and multiprogramming workloads

Anant Agarwal, John L. Hennessy, Mark Horowitz

ACM Transactions on Computer Systems

Article2012

Hypothesis testing for partial sparse recovery

Ali Tajer, H Vincent Vincent Poort

Article2002

Design of scalable shared-memory multiprocessors: the DASH approach

Daniel Lenoski, Kourosh Gharachorloo, James Laudon, Aman Gupta, John L. Hennessy, Mark Horowitz, Monica S. Lam