Loops in LLVM
Table of Contents
Loops are quite important in basicly every high level programming languages, it provides the concept to execute a piece of codes repeatedly to desired number of times. Incidentally, loops are often the most computationally intensive parts of the program. For example, at the heart of deep learning applications are matrix multiplications and most basic 2d matrix multiplication can be expressed as a 3 nested for-loops.
1// A[m][k]
2// B[k][n]
3// C[m][n]
4void matmul_2d(float *A, float *B, float *C,
5 int m, int n, int k) {
6 for(int i = 0; i < m; i++) {
7 for(int j = 0; j < n; j++) {
8 C[i][j] = 0;
9 for(int h = 0; h < k; h++) {
10 C[i][j] += A[i][h] * B[h][j];
11 }
12 }
13 }
14}
However, programmer usually do not write loop codes as optimal as it can be, so loops require all kinds of optimizations by the compiler or even hand tuned by experts in order to achieve better performance. These optimizations often results in less redundant execution (LICM), better pipline utilization of the processor (loop unrolling), better utilization of SIMD instructions (vectorization), better utilization of processor cores (multithreading), better data locality (tiling/loop reordering), and much more.
This blog will record my encounters with how loops are represented, analzyed and optimized in LLVM.
Loop Terminology #
Natural Loops #
In compilers, a natural loop is a set of nodes from the control-flow graph such that:
- exist a
header
dominates all nodes in the loop - no other edges can enter the loop nodes except
header
- nodes are strongly connected, contains a back edge to the
header
In this example, CFG exist a back edge l -> h
whose destination dominates its source (h dom l
) and a single entry point h
, a header dominates all nodes in the loop.
The natural loop of the back edge is h + {nodes that can reach l without going through h}
.
Thus the natural loop is the set [h, b, l]
.
Natural loops are the most common loops in programs, and compiler can assume certain inital conditions to hold at beigning of each iteration through the loop [Compilers Principles Techniques and Tools:665].
LLVM Loop Terminology #
In LLVM, natural loop is just called loop and more generalized definition is called a cycle. Some terminology for LLVM Loop:
- entering block: a non-loop node that has an edge into the loop (the header)
- preheader: if only one entering block and its only edge is to the header
- header: single entry point of the loop
- latch: a loop node that has an edge to the header
- backedge: edge from a latch to the header
- exiting edge: edge from inside the loop to a node outside of the loop
- exiting block: source of exiting edge
- exit block: target of exiting edge
More on:
Loop Representation in LLVM #
Simple Loop CFG #
1void test(int n) {
2 for (int i = 0; i < n; i += 1) {
3 // Loop body
4 }
5}
This is a baisc loop in c. Cas use these command to compile to llvm IR and visualize CFG of the loop.
1clang -emit-llvm -S -O0 test.cpp -o test.ll
2opt -passes=mem2reg,dot-cfg test.ll -S -o test1.ll
3dot -Tpng ._Z4testi.dot > loop_basic.png
1define dso_local void @_Z4testi(i32 noundef %0) #0 {
2; preheader
3 br label %2
4
5; header
62: ; preds = %6, %1
7 %.0 = phi i32 [ 0, %1 ], [ %7, %6 ]
8 %3 = icmp slt i32 %.0, %0
9 br i1 %3, label %5, label %4
10
11; exit
124: ; preds = %2
13 br label %8
14
15; body
165: ; preds = %2
17 br label %6
18
19; latch
206: ; preds = %5
21 %7 = add nsw i32 %.0, 1
22 br label %2, !llvm.loop !3
23
24; return
258: ; preds = %4
26 ret void
27}
data:image/s3,"s3://crabby-images/fa4f2/fa4f27f665ba5f5426d66fc9d9c7b9df2be59eca" alt=""
Loop Simplify Form #
LLVM use -loop-simplify
pass to transform loop to a canonical form with:
- one preheader
- one backedge (one latch)
- header dominates all exit blocks (no edge to exit blocks from outside the loop)
Rotated Loops #
LLVM use -loop-rotate
pass to transform loop to a do-while canonical form. But if the loop is not guaranteed to excute at least once, a guard
will be create to test inital condition.
LoopRotate ensures that the loop is in Loop Simplify Form after rotation. The reason for rotation is this form will become easier for transform passes (e.g. LICM) to hoisting instructions into the preheader.
Also generated code can have one less uncoditional jump: Why are loops always compiled into “do…while” style (tail jump)?
1opt -passes=mem2reg,loop-rotate,dot-cfg test.ll -S -o test1.ll
data:image/s3,"s3://crabby-images/125eb/125eb8011b79133f5cefdb8f1fdfb8d36d89cdca" alt=""
Loop Closed SSA #
LLVM use -lcssa
pass to transforms loops by placing phi nodes at the end of the loops for all values that are live across the loop boundary. This will make other loop optimizations simpler.
1// Before
2for (...)
3 if (c)
4 X1 = ...
5 else
6 X2 = ...
7 X3 = phi(X1, X2)
8... = X3 + 4
9
10// After
11for (...)
12 if (c)
13 X1 = ...
14 else
15 X2 = ...
16 X3 = phi(X1, X2)
17X4 = phi(X3)
18... = X4 + 4
Loop Analysis in LLVM #
LoopInfo #
LoopInfo is analysis pass in LLVM to get information about loops. Provides APIs for simple analysis of the loop:
- getExitingBlocks/getExitingBlock: get all blocks inside loop have edge to outside
- getExitBlocks/getExitBlock: get all blocks out loop have edge from inside loop
- getExitEdges: get all edge (inside, outside)
- getLoopLatches/getLoopLatch: get all loop latch blocks inside loop
- getLoopPreheader: get the preheader of the loop
- getLoopPredecessor: get the single predecessor of the loop, if predecessor only have one outgoing edge, it is also the preheader
- getInnerLoopsInPreorder: get all inner loops inside the loop nest in preorder
- getLoopsInPreorder: get all inner loops inside the loop nest including the loop in preorder
getBlock
will return the block if only exist one block, otherwise return null
LoopInfoImpl #
How LoopInfo impemented in LoopInfoImpl.h
:
1// LoopInfoBase::analyze
2for each [node] in post order dominator tree:
3 backedges = []
4 for each [predecessor] in node:
5 if (node dominates predecessor)
6 and (entry can reach predecessor):
7 add edge [predecessor, node] to backedges
8 if (backedges not empty):
9 create new loop [L]
10 // discoverAndMapSubloop
11 for each [block] in backward CFG list:
12 if (block not discovered):
13 map block to current loop L
14 add predecessors of block to CFG list
15 else:
16 get block outermost loop [L_o]
17 if (L_o is not L):
18 set L as parent of L_o
19 add predecessors of L_o header to CFG list
20for each [block] in DFS post order CFG:
21 // PopulateLoopsDFS
22 get inner most loop [L] that block lives
23 if (L exist) and (L header is block):
24 if (L is outer most loop):
25 mark L as top level loop
26 else:
27 add L to sub loops of parent loop
28 add block to L and outer loops of L when exist