NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement\n Learning
Preprint 2019 en
Authors
AH
Ameer Haj-Ali
NA
Nesreen K. Ahmed
TW
Ted Willke
Abstract
1 min read
One of the key challenges arising when compilers vectorize loops for today's\nSIMD-compatible architectures is to decide if vectorization or interleaving is\nbeneficial. Then, the compiler has to determine how many instructions to pack\ntogether and how many loop iterations to interleave. Compilers are designed\ntoday to use fixed-cost models that are based on heuristics to make\nvectorization decisions on loops. However, these models are unable to capture\nthe data dependency, the computation graph, or the organization of\ninstructions. Alternatively, software engineers often hand-write the\nvectorization factors of every loop. This, however, places a huge burden on\nthem, since it requires prior experience and significantly increases the\ndevelopment time. In this work, we explore a novel approach for handling loop\nvectorization and propose an end-to-end solution using deep reinforcement\nlearning (RL). We conjecture that deep RL can capture different instructions,\ndependencies, and data structures to enable learning a sophisticated model that\ncan better predict the actual performance cost and determine the optimal\nvectorization factors. We develop an end-to-end framework, from code to\nvectorization, that integrates deep RL in the LLVM compiler. Our proposed\nframework takes benchmark codes as input and extracts the loop codes. These\nloop codes are then fed to a loop embedding generator that learns an embedding\nfor these loops. Finally, the learned embeddings are used as input to a Deep RL\nagent, which determines the vectorization factors for all the loops. We further\nextend our framework to support multiple supervised learning methods. We\nevaluate our approaches against the currently used LLVM vectorizer and loop\npolyhedral optimization techniques. Our experiments show 1.29X-4.73X\nperformance speedup compared to baseline and only 3% worse than the brute-force\nsearch on a wide range of benchmarks.\n
Discussion(0)
No comments yet. Be the first to comment.