]> Witch of Git - web/blog/blob - posts/2018/llvm-hearts-peano-addition.md
Add drafts
[web/blog] / posts / 2018 / llvm-hearts-peano-addition.md
1 ---
2 title: "LLVM 💖s Peano Addition"
3 date: 2018-09-07
4 # tags: llvm; compilers; optimization
5 summary: "I am surprised to discover that LLVM can optimize the standard peano definition of addition, so I set out to investigate."
6 ---
7
8 This semester I'm taking an advanced compilers class.
9 We're going to be learning by making changes to LLVM, so for the first assignment I was reading recommended [introduction to LLVM][].
10 In order to give an example of some LLVM IR, it provides two small C functions implementing addition in different ways, and equivalent IR.
11
12 [introduction to LLVM]: http://www.aosabook.org/en/llvm.html
13
14 ```c
15 unsigned add1(unsigned a, unsigned b) {
16 return a+b;
17 }
18
19 // Perhaps not the most efficient way to add two numbers.
20 unsigned add2(unsigned a, unsigned b) {
21 if (a == 0) return b;
22 return add2(a-1, b+1);
23 }
24 ```
25
26 Being something of a mathematician myself, I felt I had to [defend the honor][] of "Peano-likers" from this defamation.
27 I made that joke tweet and moved on, but after [someone suggested LLVM optimize it][suggestion], I started to think about writing some of those optimization passes as hopefully easy pattern-matching definitions.
28
29 [defend the honor]: https://twitter.com/porglezomp/status/1037408309140221952
30 [suggestion]: https://twitter.com/hyperfekt/status/1037654687024119808
31
32 The next day, after compiling LLVM and getting a custom Hello World optimizer pass running, I decided to create some tests, and discovered (much to my surprise) that LLVM already handled Peano-style addition and multiplication perfectly competently!
33
34 I had just read John Regehr's [blog post on how LLVM optimizes a function][regehr], so I had an idea for how to investigate this.
35 If you haven't read that yet, you should go read that first in order to see in some more detail LLVM's optimization passes like the ones I'm going to describe below.
36
37 [regehr]: https://blog.regehr.org/archives/1603
38
39 ## How to View the Optimizations
40
41 That blog post proceeds by running the LLVM `opt` tool and examining the changes between passes.
42 You can easily get the LLVM IR corresponding to some C code using `clang`, just run:
43
44 ```shell
45 $ clang peano.c -emit-llvm -S -o peano.ll
46 ```
47
48 and you'll have a beautiful LLVM IR dump in the textual format.
49 In order to view the optimizations on that code, you can run:
50
51 ```shell
52 $ opt -O3 -print-before-all -print-after-all peano.ll
53 ```
54
55 This gives you a huge wall of IR dumps after each optimization pass.
56 If you want to do a similar investigation yourself, I wrote [a Python script that shows each pass's diff][diff script] and waits for you to continue it.
57 Make sure you have [icdiff][] (a very nice color diff tool) installed in order to use it, or else modify the diff invocation in the script.
58
59 [diff script]: https://gist.github.com/porglezomp/f2dc233f971cf3f30d45e0b501ae5ead
60 [icdiff]: https://github.com/jeffkaufman/icdiff
61
62 ## The Optimizations
63
64 As you can see from [John Regehr's blog post][regehr], LLVM's passes sometimes undo and redo lots of work without changing very much when working on a function this simple.
65 Furthermore, the code emitted by the Clang frontend is a little bit of a mess that needs quite a bit of cleanup before it's decent code, in order to avoid needing to reimplement analyses that LLVM can do perfectly well itself.
66
67 In order to make this discussion clearer, I'll use the hand-written IR from the introductory article rather than the IR emitted by clang, and only run through the necessary passes to get the job done, not the whole `-O3` pipeline.
68 At each step of the optimization, I'll provide the IR, and some roughly corresponding C code.
69
70 ### The Program
71
72 We'll be investigating this recursive definition of addition:
73
74 ```llvm
75 define i32 @add(i32 %a, i32 %b) {
76 entry:
77 %tmp1 = icmp eq i32 %a, 0
78 br i1 %tmp1, label %done, label %recurse
79
80 recurse:
81 %tmp2 = sub i32 %a, 1
82 %tmp3 = add i32 %b, 1
83 %tmp4 = call i32 @add(i32 %tmp2, i32 %tmp3)
84 ret i32 %tmp4
85
86 done:
87 ret i32 %b
88 }
89 ```
90
91 Which corresponds to this C program:
92
93 ```c
94 typedef unsigned nat;
95
96 nat add(nat a, nat b) {
97 if (a == 0) return b;
98 return add(a-1, b+1);
99 }
100 ```
101
102
103 ### Tail Call Optimization
104
105 The first important optimization here is tail call optimization.
106 Above we see that we call `@add` into `%tmp4` and then immediately return it without doing anything else in between, which makes this a tail call.
107 Therefore, in order to avoid the cost of calling functions, the extra stack frames needed, and the expose more opportunities for optimizations, tail call optimization turns our tail recursion into a loop.
108
109 ```llvm
110 define i32 @add(i32 %a, i32 %b) {
111 entry:
112 br label %tailrecurse
113
114 tailrecurse:
115 %a.tr = phi i32 [ %a, %entry ], [ %tmp2, %recurse ]
116 %b.tr = phi i32 [ %b, %entry ], [ %tmp3, %recurse ]
117 %tmp1 = icmp eq i32 %a.tr, 0
118 br i1 %tmp1, label %done, label %recurse
119
120 recurse:
121 %tmp2 = sub i32 %a.tr, 1
122 %tmp3 = add i32 %b.tr, 1
123 br label %tailrecurse
124
125 done:
126 ret i32 %b.tr
127 }
128 ```
129
130 This code approximately corresponds to:
131
132 ```c
133 nat add(nat a, nat b) {
134 while (a != 0) {
135 a -= 1;
136 b += 1;
137 }
138 return b;
139 }
140 ```
141
142 By removing the recursive call, further optimizations become visible.
143 In particular...
144
145 ### Induction Variable Simplification
146
147 Loop optimizations are a primary focus of compiler optimizations, because many programs spend most of their time in a few loops, making those loops faster is the most fruitful optimization.
148 "Induction Variable Simplification" is a specific optimization that works on identified "loop induction variables", variables that change by a constant amount each loop iteration, or that are derived from other induction variables.
149
150 Here, `a` and `b` are identified as loop induction variables.
151 Event more critically, `a` is the induction variable that controls the loop condition, so `a` is counting down towards `0`.
152 Therefore, LLVM can determine that the loop will run exactly `a` times, called the "trip count."
153
154 In cases where one of the induction variables is used after the loop and the trip count is statically known, LLVM performs an optimization where it computes the final value of the induction variable outside the loop, which splits the live range of the induction variable, and potentially makes it eligible for dead code elimination (which happens in this case).
155
156 ```llvm
157 define i32 @add(i32 %a, i32 %b) {
158 entry:
159 br label %tailrecurse
160
161 ; Loop:
162 tailrecurse:
163 %a.tr = phi i32 [ %a, %entry ], [ %tmp2, %recurse ]
164 %tmp1 = icmp eq i32 %a.tr, 0
165 br i1 %tmp1, label %done, label %recurse
166
167 recurse:
168 %tmp2 = sub i32 %a.tr, 1
169 br label %tailrecurse
170
171 ; Exit blocks
172 done:
173 %0 = add i32 %b, %a
174 ret i32 %0
175 }
176 ```
177
178 This IR looks basically like this C:
179
180 ```c
181 nat add(nat a, nat b) {
182 nat a0 = a;
183 while (a0 != 0) {
184 a0 -= 1;
185 }
186 return b + a;
187 }
188 ```
189
190 If you're interested in more details of these loop optimizations, my knowledge here comes from [some very nice lecture notes][] linked from Regehr's blog post, go read that if you want to know more about how you actually detect these cases.
191
192 [some very nice lecture notes]: https://www.cs.cmu.edu/~fp/courses/15411-f13/lectures/17-loopopt.pdf
193
194 ### Delete Dead Loops
195
196 This pass is very straightforward.
197 The loop doesn't do anything anymore, and we know it will terminate, so we can just get rid of it.
198
199 ```llvm
200 define i32 @add(i32 %a, i32 %b) {
201 entry:
202 %0 = add i32 %b, %a
203 ret i32 %0
204 }
205 ```
206
207 And therefore, our code has been optimized down to:
208 ```c
209 nat add(nat a, nat b) {
210 return b + a;
211 }
212 ```
213
214 Our recursive definition of addition turns out to actually be addition, and LLVM has proved it for us!
215
216 ## Takeaways
217
218 Very general optimizations can combine together to have some very surprising specific results, and optimizing compilers are very clever.
219
220 These same optimizations work to optimize Peano multiplication, since the loop induction variables like to work with linear functions, but they don't succeed with saturating subtraction, recursive comparisons, or min/max.
221 It'll be interesting to see if I can come up with a loop optimization pass that can deal with those more complicated trip counts / induction variables in general at all, or if I'll only succeed at pattern matching these very specific functions.