]> Witch of Git - web/blog/blob - posts/2018/llvm-hearts-peano-addition.md
Make the atom feed always include an updated element
[web/blog] / posts / 2018 / llvm-hearts-peano-addition.md
1 ---
2 title: "LLVM 💖s Peano Addition"
3 date: 2018-09-07 America/Detroit
4 tags:
5 - llvm
6 - compilers
7 - optimization
8 summary: "I am surprised to discover that LLVM can optimize the standard peano definition of addition, so I set out to investigate."
9 ---
10
11 This semester I'm taking an advanced compilers class.
12 We're going to be learning by making changes to LLVM, so for the first assignment I was reading recommended [introduction to LLVM][].
13 In order to give an example of some LLVM IR, it provides two small C functions implementing addition in different ways, and equivalent IR.
14
15 [introduction to LLVM]: http://www.aosabook.org/en/llvm.html
16
17 ```c
18 unsigned add1(unsigned a, unsigned b) {
19 return a+b;
20 }
21
22 // Perhaps not the most efficient way to add two numbers.
23 unsigned add2(unsigned a, unsigned b) {
24 if (a == 0) return b;
25 return add2(a-1, b+1);
26 }
27 ```
28
29 Being something of a mathematician myself, I felt I had to [defend the honor][] of "Peano-likers" from this defamation.
30 I made that joke tweet and moved on, but after [someone suggested LLVM optimize it][suggestion], I started to think about writing some of those optimization passes as hopefully easy pattern-matching definitions.
31
32 [defend the honor]: https://twitter.com/porglezomp/status/1037408309140221952
33 [suggestion]: https://twitter.com/hyperfekt/status/1037654687024119808
34
35 The next day, after compiling LLVM and getting a custom Hello World optimizer pass running, I decided to create some tests, and discovered (much to my surprise) that LLVM already handled Peano-style addition and multiplication perfectly competently!
36
37 I had just read John Regehr's [blog post on how LLVM optimizes a function][regehr], so I had an idea for how to investigate this.
38 If you haven't read that yet, you should go read that first in order to see in some more detail LLVM's optimization passes like the ones I'm going to describe below.
39
40 [regehr]: https://blog.regehr.org/archives/1603
41
42 ## How to View the Optimizations
43
44 That blog post proceeds by running the LLVM `opt` tool and examining the changes between passes.
45 You can easily get the LLVM IR corresponding to some C code using `clang`, just run:
46
47 ```shell-session
48 $ clang peano.c -emit-llvm -S -o peano.ll
49 ```
50
51 and you'll have a beautiful LLVM IR dump in the textual format.
52 In order to view the optimizations on that code, you can run:
53
54 ```shell-session
55 $ opt -O3 -print-before-all -print-after-all peano.ll
56 ```
57
58 This gives you a huge wall of IR dumps after each optimization pass.
59 If you want to do a similar investigation yourself, I wrote [a Python script that shows each pass's diff][diff script] and waits for you to continue it.
60 Make sure you have [icdiff][] (a very nice color diff tool) installed in order to use it, or else modify the diff invocation in the script.
61
62 [diff script]: https://gist.github.com/porglezomp/f2dc233f971cf3f30d45e0b501ae5ead
63 [icdiff]: https://github.com/jeffkaufman/icdiff
64
65 ## The Optimizations
66
67 As you can see from [John Regehr's blog post][regehr], LLVM's passes sometimes undo and redo lots of work without changing very much when working on a function this simple.
68 Furthermore, the code emitted by the Clang frontend is a little bit of a mess that needs quite a bit of cleanup before it's decent code, in order to avoid needing to reimplement analyses that LLVM can do perfectly well itself.
69
70 In order to make this discussion clearer, I'll use the hand-written IR from the introductory article rather than the IR emitted by clang, and only run through the necessary passes to get the job done, not the whole `-O3` pipeline.
71 At each step of the optimization, I'll provide the IR, and some roughly corresponding C code.
72
73 ### The Program
74
75 We'll be investigating this recursive definition of addition:
76
77 ```llvm
78 define i32 @add(i32 %a, i32 %b) {
79 entry:
80 %tmp1 = icmp eq i32 %a, 0
81 br i1 %tmp1, label %done, label %recurse
82
83 recurse:
84 %tmp2 = sub i32 %a, 1
85 %tmp3 = add i32 %b, 1
86 %tmp4 = call i32 @add(i32 %tmp2, i32 %tmp3)
87 ret i32 %tmp4
88
89 done:
90 ret i32 %b
91 }
92 ```
93
94 Which corresponds to this C program:
95
96 ```c
97 typedef unsigned nat;
98
99 nat add(nat a, nat b) {
100 if (a == 0) return b;
101 return add(a-1, b+1);
102 }
103 ```
104
105
106 ### Tail Call Optimization
107
108 The first important optimization here is tail call optimization.
109 Above we see that we call `@add` into `%tmp4` and then immediately return it without doing anything else in between, which makes this a tail call.
110 Therefore, in order to avoid the cost of calling functions, the extra stack frames needed, and to expose more opportunities for optimizations, tail call optimization turns our tail recursion into a loop.
111
112 ```llvm
113 define i32 @add(i32 %a, i32 %b) {
114 entry:
115 br label %tailrecurse
116
117 tailrecurse:
118 %a.tr = phi i32 [ %a, %entry ], [ %tmp2, %recurse ]
119 %b.tr = phi i32 [ %b, %entry ], [ %tmp3, %recurse ]
120 %tmp1 = icmp eq i32 %a.tr, 0
121 br i1 %tmp1, label %done, label %recurse
122
123 recurse:
124 %tmp2 = sub i32 %a.tr, 1
125 %tmp3 = add i32 %b.tr, 1
126 br label %tailrecurse
127
128 done:
129 ret i32 %b.tr
130 }
131 ```
132
133 This code approximately corresponds to:
134
135 ```c
136 nat add(nat a, nat b) {
137 while (a != 0) {
138 a -= 1;
139 b += 1;
140 }
141 return b;
142 }
143 ```
144
145 By removing the recursive call, further optimizations become visible.
146 In particular...
147
148 ### Induction Variable Simplification
149
150 Loop optimizations are a primary focus of compiler optimizations, because many programs spend most of their time in a few loops, making those loops faster is the most fruitful optimization.
151 "Induction Variable Simplification" is a specific optimization that works on identified "loop induction variables", variables that change by a constant amount each loop iteration, or that are derived from other induction variables.
152
153 Here, `a` and `b` are identified as loop induction variables.
154 Event more critically, `a` is the induction variable that controls the loop condition, so `a` is counting down towards `0`.
155 Therefore, LLVM can determine that the loop will run exactly `a` times, called the "trip count."
156
157 In cases where one of the induction variables is used after the loop and the trip count is statically known, LLVM performs an optimization where it computes the final value of the induction variable outside the loop, which splits the live range of the induction variable, and potentially makes it eligible for dead code elimination (which happens in this case).
158
159 ```llvm
160 define i32 @add(i32 %a, i32 %b) {
161 entry:
162 br label %tailrecurse
163
164 ; Loop:
165 tailrecurse:
166 %a.tr = phi i32 [ %a, %entry ], [ %tmp2, %recurse ]
167 %tmp1 = icmp eq i32 %a.tr, 0
168 br i1 %tmp1, label %done, label %recurse
169
170 recurse:
171 %tmp2 = sub i32 %a.tr, 1
172 br label %tailrecurse
173
174 ; Exit blocks
175 done:
176 %0 = add i32 %b, %a
177 ret i32 %0
178 }
179 ```
180
181 This IR looks basically like this C:
182
183 ```c
184 nat add(nat a, nat b) {
185 nat a0 = a;
186 while (a0 != 0) {
187 a0 -= 1;
188 }
189 return b + a;
190 }
191 ```
192
193 If you're interested in more details of these loop optimizations, my knowledge here comes from [some very nice lecture notes][] linked from Regehr's blog post, go read that if you want to know more about how you actually detect these cases.
194
195 [some very nice lecture notes]: https://www.cs.cmu.edu/~fp/courses/15411-f13/lectures/17-loopopt.pdf
196
197 ### Delete Dead Loops
198
199 This pass is very straightforward.
200 The loop doesn't do anything anymore, and we know it will terminate, so we can just get rid of it.
201
202 ```llvm
203 define i32 @add(i32 %a, i32 %b) {
204 entry:
205 %0 = add i32 %b, %a
206 ret i32 %0
207 }
208 ```
209
210 And therefore, our code has been optimized down to:
211 ```c
212 nat add(nat a, nat b) {
213 return b + a;
214 }
215 ```
216
217 Our recursive definition of addition turns out to actually be addition, and LLVM has proved it for us!
218
219 ## Takeaways
220
221 Very general optimizations can combine together to have some very surprising specific results, and optimizing compilers are very clever.
222
223 These same optimizations work to optimize Peano multiplication, since the loop induction variables like to work with linear functions, but they don't succeed with saturating subtraction, recursive comparisons, or min/max.
224 It'll be interesting to see if I can come up with a loop optimization pass that can deal with those more complicated trip counts / induction variables in general at all, or if I'll only succeed at pattern matching these very specific functions.