Loops in LLVM

Table of Contents

Loops are quite important in basicly every high level programming languages, it provides the concept to execute a piece of codes repeatedly to desired number of times. Incidentally, loops are often the most computationally intensive parts of the program. For example, at the heart of deep learning applications are matrix multiplications and most basic 2d matrix multiplication can be expressed as a 3 nested for-loops.

 1// A[m][k]
 2// B[k][n]
 3// C[m][n]
 4void matmul_2d(float *A, float *B, float *C, 
 5                   int m, int n, int k) {
 6    for(int i = 0; i < m; i++) {
 7        for(int j = 0; j < n; j++) {
 8            C[i][j] = 0;
 9            for(int h = 0; h < k; h++) {
10                C[i][j] += A[i][h] * B[h][j];
11            }
12        }
13    }
14}

However, programmer usually do not write loop codes as optimal as it can be, so loops require all kinds of optimizations by the compiler or even hand tuned by experts in order to achieve better performance. These optimizations often results in less redundant execution (LICM), better pipline utilization of the processor (loop unrolling), better utilization of SIMD instructions (vectorization), better utilization of processor cores (multithreading), better data locality (tiling/loop reordering), and much more.

This blog will record my encounters with how loops are represented, analzyed and optimized in LLVM.

Loop Terminology #

Natural Loops #

In compilers, a natural loop is a set of nodes from the control-flow graph such that:

exist a header dominates all nodes in the loop
no other edges can enter the loop nodes except header
nodes are strongly connected, contains a back edge to the header

graph TD; A[Entry]-->H[h: Header]; H-->B{b: Body}; B-->L[l: Latch]; L-->H; B-->E[e: Exit]; subgraph Natural Loop H B L end

In this example, CFG exist a back edge l -> h whose destination dominates its source (h dom l) and a single entry point h, a header dominates all nodes in the loop.
The natural loop of the back edge is h + {nodes that can reach l without going through h}.
Thus the natural loop is the set [h, b, l].

Natural loops are the most common loops in programs, and compiler can assume certain inital conditions to hold at beigning of each iteration through the loop [Compilers Principles Techniques and Tools:665].

LLVM Loop Terminology #

In LLVM, natural loop is just called loop and more generalized definition is called a cycle. Some terminology for LLVM Loop:

entering block: a non-loop node that has an edge into the loop (the header)
preheader: if only one entering block and its only edge is to the header
header: single entry point of the loop
latch: a loop node that has an edge to the header
backedge: edge from a latch to the header
exiting edge: edge from inside the loop to a node outside of the loop
exiting block: source of exiting edge
exit block: target of exiting edge

Loop Representation in LLVM #

Simple Loop CFG #

1void test(int n) {
2  for (int i = 0; i < n; i += 1) {
3    // Loop body
4  }
5}

This is a baisc loop in c. Cas use these command to compile to llvm IR and visualize CFG of the loop.

1clang -emit-llvm -S -O0 test.cpp -o test.ll
2opt -passes=mem2reg,dot-cfg test.ll -S -o test1.ll
3dot -Tpng ._Z4testi.dot > loop_basic.png

 1define dso_local void @_Z4testi(i32 noundef %0) #0 {
 2; preheader
 3  br label %2
 4
 5; header
 62:                                                ; preds = %6, %1
 7  %.0 = phi i32 [ 0, %1 ], [ %7, %6 ]
 8  %3 = icmp slt i32 %.0, %0
 9  br i1 %3, label %5, label %4
10
11; exit
124:                                                ; preds = %2
13  br label %8
14
15; body
165:                                                ; preds = %2
17  br label %6
18
19; latch
206:                                                ; preds = %5
21  %7 = add nsw i32 %.0, 1
22  br label %2, !llvm.loop !3
23
24; return
258:                                                ; preds = %4
26  ret void
27}

Loop Simplify Form #

LLVM use -loop-simplify pass to transform loop to a canonical form with:

one preheader
one backedge (one latch)
header dominates all exit blocks (no edge to exit blocks from outside the loop)

Rotated Loops #

LLVM use -loop-rotate pass to transform loop to a do-while canonical form. But if the loop is not guaranteed to excute at least once, a guard will be create to test inital condition.
LoopRotate ensures that the loop is in Loop Simplify Form after rotation. The reason for rotation is this form will become easier for transform passes (e.g. LICM) to hoisting instructions into the preheader.
Also generated code can have one less uncoditional jump: Why are loops always compiled into “do…while” style (tail jump)?

1opt -passes=mem2reg,loop-rotate,dot-cfg test.ll -S -o test1.ll

Loop Closed SSA #

LLVM use -lcssa pass to transforms loops by placing phi nodes at the end of the loops for all values that are live across the loop boundary. This will make other loop optimizations simpler.

 1// Before
 2for (...)
 3    if (c)
 4        X1 = ...
 5    else
 6        X2 = ...
 7    X3 = phi(X1, X2)
 8... = X3 + 4
 9
10// After
11for (...)
12    if (c)
13        X1 = ...
14    else
15        X2 = ...
16    X3 = phi(X1, X2)
17X4 = phi(X3)
18... = X4 + 4

From Loop-Closed SSA Form Pass.

Loop Analysis in LLVM #

LoopInfo #

LoopInfo is analysis pass in LLVM to get information about loops. Provides APIs for simple analysis of the loop:

getExitingBlocks/getExitingBlock: get all blocks inside loop have edge to outside
getExitBlocks/getExitBlock: get all blocks out loop have edge from inside loop
getExitEdges: get all edge (inside, outside)
getLoopLatches/getLoopLatch: get all loop latch blocks inside loop
getLoopPreheader: get the preheader of the loop
getLoopPredecessor: get the single predecessor of the loop, if predecessor only have one outgoing edge, it is also the preheader
getInnerLoopsInPreorder: get all inner loops inside the loop nest in preorder
getLoopsInPreorder: get all inner loops inside the loop nest including the loop in preorder

getBlock will return the block if only exist one block, otherwise return null

LoopInfoImpl #

How LoopInfo impemented in LoopInfoImpl.h:

 1// LoopInfoBase::analyze
 2for each [node] in post order dominator tree:
 3    backedges = []
 4    for each [predecessor] in node:
 5            if (node dominates predecessor) 
 6              and (entry can reach predecessor):
 7                add edge [predecessor, node] to backedges
 8    if (backedges not empty):
 9        create new loop [L]
10        // discoverAndMapSubloop
11        for each [block] in backward CFG list:
12            if (block not discovered):
13                map block to current loop L
14                add predecessors of block to CFG list
15            else:
16                get block outermost loop [L_o]
17                if (L_o is not L):
18                    set L as parent of L_o
19                    add predecessors of L_o header to CFG list
20for each [block] in DFS post order CFG:
21    // PopulateLoopsDFS
22    get inner most loop [L] that block lives
23    if (L exist) and (L header is block):
24        if (L is outer most loop):
25            mark L as top level loop 
26        else:
27            add L to sub loops of parent loop
28    add block to L and outer loops of L when exist