From 5fa228920705a4bff50a1ad71d48332236b11b32 Mon Sep 17 00:00:00 2001 From: Cassie Jones Date: Sat, 2 Nov 2019 01:25:34 -0400 Subject: [PATCH] Add some posts from my old blog --- posts/2016/i-am-a-rust-contributor.md | 23 ++ posts/2017/package-manager-race-condition.md | 90 ++++++++ posts/2018/debugging-the-deep-end.md | 172 +++++++++++++++ posts/2018/heat-signature.md | 68 ++++++ posts/2018/llvm-hearts-peano-addition.md | 221 +++++++++++++++++++ posts/2018/min-benchmark.md | 100 +++++++++ 6 files changed, 674 insertions(+) create mode 100644 posts/2016/i-am-a-rust-contributor.md create mode 100644 posts/2017/package-manager-race-condition.md create mode 100644 posts/2018/debugging-the-deep-end.md create mode 100644 posts/2018/heat-signature.md create mode 100644 posts/2018/llvm-hearts-peano-addition.md create mode 100644 posts/2018/min-benchmark.md diff --git a/posts/2016/i-am-a-rust-contributor.md b/posts/2016/i-am-a-rust-contributor.md new file mode 100644 index 0000000..1b69689 --- /dev/null +++ b/posts/2016/i-am-a-rust-contributor.md @@ -0,0 +1,23 @@ +--- +title: "I'm a Rust Contributor" +date: 2016-09-20 +summary: "I got a PR merged into Rust, wow, that's pretty cool!" +--- + +A month or two ago I was on the #rust IRC when someone discovered that `pow()` didn't act quite right for unsigned numbers. +This was a bug that was isolated to a single function, so it seemed like something that I could handle. +The issue got posted, I claimed it and debugged it, and actually managed to fix it! +It took a little while, but very early this morning PR #34942 [*Fix overflow checking in unsigned pow()*][issue34942] was merged. +Now I'm a contributor to Rust! + +[issue34942]: https://github.com/rust-lang/rust/pull/34942#event-795131414 + +Try to find a small thing that you can fix in something that you use. +Somewhere in there is the right issue that you can fix. +It's a great experience. +(I really look forward to the release notes for 1.12...) + +**Update:** +It turns out my fix made it into the 1.13 release, and my name is in the [contributors section in the release notes][contributors]. + +[contributors]: https://blog.rust-lang.org/2016/11/10/Rust-1.13.html#contributors-to-1130 diff --git a/posts/2017/package-manager-race-condition.md b/posts/2017/package-manager-race-condition.md new file mode 100644 index 0000000..c148223 --- /dev/null +++ b/posts/2017/package-manager-race-condition.md @@ -0,0 +1,90 @@ +--- +title: "You Got Your Race Condition Inside My Package Manager!" +date: 2017-08-14 +author: Cassie Jones +summary: "We solve a fun, surreal debugging mystery involving pip and version compatibility." +--- + +# A Case of Broken Builds + +The continuous integration servers at my current job are unfortunately stateful. +Every week or so, we run a bunch of configuration processes to reinstall packages to keep the environment clean. +One of these reinstalls `pip` and the Python libraries used by build tools. +This morning, I got a message from one of the build engineers telling me that the Python libraries weren't installing correctly anymore. +(Even though I'm an intern, I'm apparently one of the office Python experts now.) +So, I opened up the build log, and began looking around. + +What was failing was pretty clear: + +``` +Collecting ruamel.yaml + Using cached ruamel.yaml-0.15.28.tar.gz +Installing collected packages: ruamel.yaml + ... + error: Microsoft Visual C++ 9.0 is required +``` + +but why was this suddenly happening now, without us making any changes to our configuration? +Also, why are we installing ruamel.yaml? +We're not using that! + +Long story short, ruamel.yaml was a transitive dependency of dateparser, an excellent library for parsing natural language dates. +It wasn't clear to me why it would be suddenly failing though, so I decided to go investigating further. +Looking at the release notes of dateparser, I saw that they had recently pinned ruamel.yaml to `<0.14`, which we clearly weren't getting. +Previously, the version was un-pinned, so I decided to go look at the release notes for ruamel.yaml, and sure enough, there were releases over the weekend—those must've been what broke it. + +We upgraded our dependency on dateparser to 0.6, and tried again... and it still failed while trying to build the newest version of ruamel.yaml. +One period of looking at GitHub blame views, commit histories, and unpacking PyPI tarballs later, I determined that version 0.6 of dateparser released on PyPI doesn't actually have the pin the version of ruamel.yaml, despite what the changelog claims. +(I opened [dateparser issue #342][] for this.) + +Since the version wasn't pinned, we just asked pip to first install an older version of ruamel.yaml, to hopefully get priority when dateparser tried to install it. +So, we put `ruamel.yaml==0.13.14` in our package list, and then tried again. +Finally, everything worked perfectly. + +Case closed. + +--- + +# This Fix is a Mystery + +But wait, what's this? +Looking closer at the successful build logs, we can see that both `ruamel.yaml-0.13.14` and `ruamel.yaml-0.15.29` are installing without complaint. +What's stopped the error? +Well, if you'll look at the version number up at the top, we were installing `ruamel.yaml-0.15.28` before—just one hour previously, while I was on my lunch break, an update to ruamel.yaml had been released. +Looking back at previous versions on PyPI, I finally figured out what had gone wrong. +If you look at the downloads on the PyPI page for [ruamel.yaml version 0.15.28][0.15.28], you'll see that there are no Windows wheels. +(Wheels are the format that Python uses to distribute compiled C extensions and pre-packed libraries.) +However, if you go to the page for [version 0.15.29][0.15.29], then you'll see that Windows wheels are finally present. +So, I guess until dateparser fixes their version pinning, we'll just have to hope that ruamel.yaml stays packaged correctly. + +Case closed. + +--- + +# We Get Very Unlucky + +Oops, nope it's not. +Later in the afternoon, I got another message that some of the builds had failed. +Looking at the first build that started failing, again we see that... + +``` +Collecting ruamel.yaml + Using cached ruamel.yaml-0.15.30.tar.gz +Installing collected packages: ruamel.yaml + ... + error: Microsoft Visual C++ 9.0 is required +``` + +okay, this project releases *fast*, this is the fourth release in 2 days. +In any case, the last few builds succeeded with `0.15.30`, so what happened? +Well, I don't know for sure, but I have a pretty good guess. +I suspect that the release process for ruamel.yaml isn't atomic, and that they upload their source releases first, and the wheels come a bit later. +We were unlucky enough to start a build during that first upload, where only the source package was available, and no Windows wheels. +But, the few builds that got held up and started 4 minutes after the others took long enough that the wheels were available, and so they installed without any fuss. + +This was an exceptionally unlucky situation. +But, I've got a very good story now—and also a much greater appreciation for various package manager `.lock` files. + +[dateparser issue #342]: https://github.com/scrapinghub/dateparser/issues/342 +[0.15.28]: https://pypi.python.org/pypi/ruamel.yaml/0.15.28 +[0.15.29]: https://pypi.python.org/pypi/ruamel.yaml/0.15.29 diff --git a/posts/2018/debugging-the-deep-end.md b/posts/2018/debugging-the-deep-end.md new file mode 100644 index 0000000..13e4cfa --- /dev/null +++ b/posts/2018/debugging-the-deep-end.md @@ -0,0 +1,172 @@ +--- +title: "Debugging in the Deep End" +date: 2018-04-17 +# tags: code +author: Cassie Jones +summary: "I discuss the approach I used to fix a bug in VCV Rack despite having never looked at the codebase before." +--- + +# The Problem + +Last week I was working with [Aaron][] on a series of [VCV Rack plugin modules][VCVMicroTools], and we were trying to add our own custom graphics for them. +VCV Rack uses SVG for its plugins, so Aaron had built a front face for one of our modules, but it wasn't properly aligned. +I imported it into Affinity Designer and tried to fix it up, but when I exported my new version and loaded it, suddenly all of our modules had vanished. +Since our module wasn't *supposed* to vanish, and I hadn't done anything *obviously* wrong, I decided that this must be a bug in VCV Rack. +Over the next few hours, I diagnosed and managed to fix this bug, and by the magic of open source and some luck, the PRs got merged the next day. +In particular, I managed to make this fix without having ever looked at any of this code before, and I'd like to share the process I followed to manage to do this. + +[Aaron]: http://twitter.com/a2aarontothe2 +[VCVMicroTools]: https://github.com/a2aaron/VCVMicroTools + +[VCV Issue]: https://github.com/VCVRack/Rack/issues/917 +[nanosvg PR]: https://github.com/memononen/nanosvg/pull/116 + +# Debugging the SVG + +The first phase when fixing a bug is to reproduce the bug. +Here, because the rendering worked fine with Aaron's SVG until I re-exported it, I suspected that some feature being used in Affinity's SVG export wasn't supported by the VCV Rack SVG renderer. +To figure out which, I used the first technique: minimize your failing case. + +First, I tried changing export settings, removing groups to flatten the SVG, doing everything I could to remove different features. +As I went, I inspected the working and not working SVG side-by-side to see what the differences were. +I didn't make much progress this way, so I started from the other direction, building up instead of tearing down. +I saved a simple blank grey square, just a single element. +When that didn't work, I figured it must have something to do with one of the attributes on the `` container element. +For reference, a minimal SVG exported from Affinity might look something like: + +```svg + + + + + +``` + +So I looked through all the settings on that, I noticed that it was setting the width and height to `100%`, whereas the working one was setting it to explicit pixel numbers. +I copied the width and height out of the working one into the not-working one… and that fixed it. +That suggested a possible problem: If you used percentage dimensions on the `` element, it wouldn't correctly calculate the size of the object, and would simply make it 0 by 0. +This was a good enough guess for me, so I set about trying to figure out how to fix that. + +# Spelunking the Code + +This brings us to the second phase of solving the bug: find a piece of code that's related to the bug, so you have a place to start. +I suspected that I could find where the SVGs were loaded in VCV Rack and fix it to handle those percentages correctly. +I didn't know exactly how I would handle them yet, I had to see what it was doing first. +To find this, I took a simple approach: search the source code for the word "SVG" and see what I could find! +I used [ripgrep][], a very good search tool, but you can use whatever tool you have available as long as it can search all the code at once. +If your editor can jump to definitions in a project, searching for related words and then jumping from definition to definition can help you find the part of the code you're interested in very quickly; having good code navigation tools helps *a lot*. + +[ripgrep]: https://github.com/BurntSushi/ripgrep + +Using this, I found SVG widgets, followed their class hierarchy up to rendering components, and then eventually I found my way to a class calling functions from "nanosvg." +Curious, I looked it up, and saw that it was a small SVG parser library, and that it produces a bunch of shape paths. +In order to not have to resize all those paths (I assumed), I decided to try fixing the bug from inside nanosvg instead of inside VCV Rack. +Knowing that it was a problem with dimensions, I searched the nanosvg code for the string `"width"`. +The second result was a very promising looking function: + +```c hl_lines="6 7 8 9" +static void nsvg__parseSVG(NSVGparser* p, const char** attr) +{ + int i; + for (i = 0; attr[i]; i += 2) { + if (!nsvg__parseAttr(p, attr[i], attr[i + 1])) { + if (strcmp(attr[i], "width") == 0) { + p->image->width = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 0.0f); + } else if (strcmp(attr[i], "height") == 0) { + p->image->height = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 0.0f); +// … +``` + +# Writing a Fix + +I'd located a likely location for the bug, so now I changed mode from code spelunking to trying to understand what the code did. +Since this function looked so relevant, I first tried to figure out what `nsvg__parseSVG` was doing. +A good tool for this was finding where it was used: it was getting called in one place, from `nsvg__startElement`, and seemed to be being called when an `` tag was found, to compute the context from the attributes… perfect. +The parameter `const char** attr` suggested a list of attribute strings, and the usage `attr[i]` and `attr[i + 1]` suggested the SVG key/value pairs. +Therefore, it seemed like + +```c +if (strcmp(attr[i], "width") == 0) + p->image->width = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 1.0f); +``` + +would parse the width coordinate value. +In order to figure this out, we want to go look at `nsvg__parseCoordinate`. + +```c +static float nsvg__parseCoordinate(NSVGparser* p, const char* str, + float orig, float length) +{ + NSVGcoordinate coord = nsvg__parseCoordinateRaw(str); + return nsvg__convertToPixels(p, coord, orig, length); +} +``` + +Following those definitions, `nsvg__parseCoordinateRaw` follows a few steps to get to unit parsing, but it seems largely straightforward parsing of the data, no fancy processing. +The fact that we've got an issue in % suggests that `nsvg__convertToPixels` is doing something interesting. +And indeed, looking at the code for that function, it made clear what the `length` argument did: + +```c hl_lines="7" +static float nsvg__convertToPixels(NSVGparser* p, NSVGcoordinate c, + float orig, float length) +{ + NSVGattrib* attr = nsvg__getAttr(p); + switch (c.units) { + // … + case NSVG_UNITS_PERCENT: return orig + c.value / 100.0f * length; + default: return c.value; + } + return c.value; +} +``` + +It was used as the base value that the percentage should be relative to. +Then, it becomes clear: `nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 1.0f);` makes `100%` into `1px` +So, now we know what exactly has gone wrong, how do we solve it? +Since I didn't know what the percentages should be relative to, I started researching, looking at Mozilla references for how the percent should behave. + +I didn't find an answer, but while I was researching, I ran into lots of examples that didn't specify dimensions at all. +This made me suspicious: nanosvg handles most SVGs correctly, so it must have some code to handle this case. +When you're fixing a bug, often the edge case that you're running into is similar to another edge case that's already handled, and you just need to make it cover your case as well. +Since this must be related to the dimensions, and the dimension handling sets the `width` field while parsing the `` element, I went out searching for `->width` and `.width` in the code. +I immediately found `nsvg__scaleToViewbox` which contains a promising looking block of code: + +```c hl_lines="1" +if (p->viewWidth == 0) { + if (p->image->width > 0) { + p->viewWidth = p->image->width; + } else { + p->viewMinx = bounds[0]; + p->viewWidth = bounds[2] - bounds[0]; + } +} +``` + +This looks like what we want! +It will recalculate the width and height if they're set to 0, so we just need to make sure that our 100% sets it to 0 instead of 1. +And to fix that, we can simply change: + +```diff + if (strcmp(attr[i], "width") == 0) { +- p->image->width = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 1.0f); ++ p->image->width = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 0.0f); + } else if (strcmp(attr[i], "height") == 0) { +- p->image->height = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 1.0f); ++ p->image->height = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 0.0f); + } else if (strcmp(attr[i], "viewBox") == 0) { +``` + +And that's the whole fix! + +# Conclusions + +You can use these techniques the next time you have to jump into a large codebase that's unfamiliar. +Finding a simple case that fails, making a hypothesis about why it fails, and then searching for terms related to that gives you a big head-start navigating the code. +Being able to jump to definitions helps you build a mental map of a thin slice of the code. +Even though Rack is about 11K lines of code, and nanosvg is almost 3K, in the process of fixing this bug I only *glanced at* a few hundred lines of code, and only tried to understand a few dozen of them. +The next time you want to try to examine a new codebase, keep these tricks in mind. diff --git a/posts/2018/heat-signature.md b/posts/2018/heat-signature.md new file mode 100644 index 0000000..1abf5b5 --- /dev/null +++ b/posts/2018/heat-signature.md @@ -0,0 +1,68 @@ +--- +title: "Getting Spaced in Heat Signature" +date: 2018-10-05 +# tags: video games; story +author: Cassie Jones +summary: "I tell the dramatic story of one of my early heat signature games." +--- + +[Heat Signature][] is a game where you can go inside the spaceships. +More specifically, it's a game designed by [Tom Francis][] about being a space freedom fighter and mercenary, hijacking spaceships, doing cool missions, and trying to complete your own personal goal before you retire or die. +If you want to get a good feel for how the game works, he has an excellent series where he's playing the daily challenge missions every day for a week, [which you can watch here][dailies]. + +Heat Signature does something I really like out of video games, where it gives you lots of tools to get yourself into trouble, and then just as many ways to try to get yourself back out. +You can find this structure in a variety of games; one of the first times I heard it clearly articulated was in describing [FTL: Faster Than Light][FTL], which is a game where you both figuratively and literally have to put out a lot of fires. +This mode of play lends itself to very memorable stories. +Whether you succeed or fail, you'll probably have something to tell a friend about. +The mechanics hand you a satisfying narrative arc and built-in twists make the story rich. +In my first hour and a half playing it, I got one of those stories that I felt was worth writing up. + +[Heat Signature]: https://spaceships.cool +[Tom Francis]: https://twitter.com/Pentadact/ +[dailies]: https://www.youtube.com/playlist?list=PLUtKzyIe0aB1YePV-rl-4pGjYTL2WRCRo +[FTL]: https://subsetgames.com/ftl.html + +----- + +I had done very well on the normal missions for my first character. +I had pulled off two almost flawless hard missions in a row, so I decided I should try the daily challenge. +My mission is to assassinate the 3 guards that had killed my boyfriend, without extra casualties. +(A daily challenge always has an extra condition that determines your score.) +To help me on this mission, I have an armor-piercing short-blade, a concussive gun, and a swapper teleporter. + +I fly my pod to the first mission ship and begin scoping it out. +Unfortunately, on this mission there's a jammer making the rounds---they wander around the ship laying down huge jammer fields that prevent gadgets from working. +During the first few seconds that I watch the ship, the jammer lays down their first field. + +Now I'm in a hurry. +I don't want them to get a chance to cover too much more of the ship, since that makes me much less effective. +In my haste, I act a bit rashly. +I figure I can rush in and knock the first guard out when they're not looking, and then knock out the jammer quickly after. +As I step through my pod's airlock into the ship, I realize I don't have a non-lethal melee weapon, and my gun's not silenced. +If I shoot, it's gonna be *really* loud. +So, as the guard begins to take notice of me, I do what any sensible person would do: I throw my gun. + +It sails out of my hand toward the guard. +But this guard is wearing armor. +The gun clangs harmlessly off it, and they continue to stare me down unfazed, their alert meter slowly creeping higher towards the point where they'll shoot me. +You know what's worse though? +As the gun ricochets of their armored chest, it goes off. +Every guard in the sector has heard me. +They know there's a fight going on and they all start rushing towards the noise, sounding the alarm on the way. +I'm slightly panicked, but I still feel like I can get out of this. +Now that I've thrown my gun, I don't have any weapon to take down the guard in front of me. +Even if I was willing to kill them, I'm too far away to reach with my short-blade---but I can probably teleport away! + +I pull out my visitor (a teleporter that temporarily brings you to a spot, then a few seconds later pulls you back) and try to pick the best destination. +I want to stay out of sight, and hopefully get pulled back behind the guard after they go looking for me. +I carefully make my choice. +And click. +And the teleporter fizzles, because I'm still standing in field that the jammer laid down at the beginning of the mission +A second later, the guard shoots me, and throws me unceremoniously out the airlock. +As I try to grab my unconscious body with my pod, the alarm timer countdown runs out, and my targets escape. + +----- + +And that's the story of how I failed my first Heat Signature daily mission in only 33 seconds. +This took several minutes of real time, because the game encourages you to pause frequently planning and reflexes, but the entire mission from start to finish only took 33 seconds of game time. +If this sounds like your sort of game, consider [buying it for yourself][Heat Signature], and tell me what story you get yourself into. diff --git a/posts/2018/llvm-hearts-peano-addition.md b/posts/2018/llvm-hearts-peano-addition.md new file mode 100644 index 0000000..1fc66da --- /dev/null +++ b/posts/2018/llvm-hearts-peano-addition.md @@ -0,0 +1,221 @@ +--- +title: "LLVM 💖s Peano Addition" +date: 2018-09-07 +# tags: llvm; compilers; optimization +summary: "I am surprised to discover that LLVM can optimize the standard peano definition of addition, so I set out to investigate." +--- + +This semester I'm taking an advanced compilers class. +We're going to be learning by making changes to LLVM, so for the first assignment I was reading recommended [introduction to LLVM][]. +In order to give an example of some LLVM IR, it provides two small C functions implementing addition in different ways, and equivalent IR. + +[introduction to LLVM]: http://www.aosabook.org/en/llvm.html + +```c +unsigned add1(unsigned a, unsigned b) { + return a+b; +} + +// Perhaps not the most efficient way to add two numbers. +unsigned add2(unsigned a, unsigned b) { + if (a == 0) return b; + return add2(a-1, b+1); +} +``` + +Being something of a mathematician myself, I felt I had to [defend the honor][] of "Peano-likers" from this defamation. +I made that joke tweet and moved on, but after [someone suggested LLVM optimize it][suggestion], I started to think about writing some of those optimization passes as hopefully easy pattern-matching definitions. + +[defend the honor]: https://twitter.com/porglezomp/status/1037408309140221952 +[suggestion]: https://twitter.com/hyperfekt/status/1037654687024119808 + +The next day, after compiling LLVM and getting a custom Hello World optimizer pass running, I decided to create some tests, and discovered (much to my surprise) that LLVM already handled Peano-style addition and multiplication perfectly competently! + +I had just read John Regehr's [blog post on how LLVM optimizes a function][regehr], so I had an idea for how to investigate this. +If you haven't read that yet, you should go read that first in order to see in some more detail LLVM's optimization passes like the ones I'm going to describe below. + +[regehr]: https://blog.regehr.org/archives/1603 + +## How to View the Optimizations + +That blog post proceeds by running the LLVM `opt` tool and examining the changes between passes. +You can easily get the LLVM IR corresponding to some C code using `clang`, just run: + +```shell +$ clang peano.c -emit-llvm -S -o peano.ll +``` + +and you'll have a beautiful LLVM IR dump in the textual format. +In order to view the optimizations on that code, you can run: + +```shell +$ opt -O3 -print-before-all -print-after-all peano.ll +``` + +This gives you a huge wall of IR dumps after each optimization pass. +If you want to do a similar investigation yourself, I wrote [a Python script that shows each pass's diff][diff script] and waits for you to continue it. +Make sure you have [icdiff][] (a very nice color diff tool) installed in order to use it, or else modify the diff invocation in the script. + +[diff script]: https://gist.github.com/porglezomp/f2dc233f971cf3f30d45e0b501ae5ead +[icdiff]: https://github.com/jeffkaufman/icdiff + +## The Optimizations + +As you can see from [John Regehr's blog post][regehr], LLVM's passes sometimes undo and redo lots of work without changing very much when working on a function this simple. +Furthermore, the code emitted by the Clang frontend is a little bit of a mess that needs quite a bit of cleanup before it's decent code, in order to avoid needing to reimplement analyses that LLVM can do perfectly well itself. + +In order to make this discussion clearer, I'll use the hand-written IR from the introductory article rather than the IR emitted by clang, and only run through the necessary passes to get the job done, not the whole `-O3` pipeline. +At each step of the optimization, I'll provide the IR, and some roughly corresponding C code. + +### The Program + +We'll be investigating this recursive definition of addition: + +```llvm +define i32 @add(i32 %a, i32 %b) { +entry: + %tmp1 = icmp eq i32 %a, 0 + br i1 %tmp1, label %done, label %recurse + +recurse: + %tmp2 = sub i32 %a, 1 + %tmp3 = add i32 %b, 1 + %tmp4 = call i32 @add(i32 %tmp2, i32 %tmp3) + ret i32 %tmp4 + +done: + ret i32 %b +} +``` + +Which corresponds to this C program: + +```c +typedef unsigned nat; + +nat add(nat a, nat b) { + if (a == 0) return b; + return add(a-1, b+1); +} +``` + + +### Tail Call Optimization + +The first important optimization here is tail call optimization. +Above we see that we call `@add` into `%tmp4` and then immediately return it without doing anything else in between, which makes this a tail call. +Therefore, in order to avoid the cost of calling functions, the extra stack frames needed, and the expose more opportunities for optimizations, tail call optimization turns our tail recursion into a loop. + +```llvm +define i32 @add(i32 %a, i32 %b) { +entry: + br label %tailrecurse + +tailrecurse: + %a.tr = phi i32 [ %a, %entry ], [ %tmp2, %recurse ] + %b.tr = phi i32 [ %b, %entry ], [ %tmp3, %recurse ] + %tmp1 = icmp eq i32 %a.tr, 0 + br i1 %tmp1, label %done, label %recurse + +recurse: + %tmp2 = sub i32 %a.tr, 1 + %tmp3 = add i32 %b.tr, 1 + br label %tailrecurse + +done: + ret i32 %b.tr +} +``` + +This code approximately corresponds to: + +```c +nat add(nat a, nat b) { + while (a != 0) { + a -= 1; + b += 1; + } + return b; +} +``` + +By removing the recursive call, further optimizations become visible. +In particular... + +### Induction Variable Simplification + +Loop optimizations are a primary focus of compiler optimizations, because many programs spend most of their time in a few loops, making those loops faster is the most fruitful optimization. +"Induction Variable Simplification" is a specific optimization that works on identified "loop induction variables", variables that change by a constant amount each loop iteration, or that are derived from other induction variables. + +Here, `a` and `b` are identified as loop induction variables. +Event more critically, `a` is the induction variable that controls the loop condition, so `a` is counting down towards `0`. +Therefore, LLVM can determine that the loop will run exactly `a` times, called the "trip count." + +In cases where one of the induction variables is used after the loop and the trip count is statically known, LLVM performs an optimization where it computes the final value of the induction variable outside the loop, which splits the live range of the induction variable, and potentially makes it eligible for dead code elimination (which happens in this case). + +```llvm +define i32 @add(i32 %a, i32 %b) { +entry: + br label %tailrecurse + +; Loop: +tailrecurse: + %a.tr = phi i32 [ %a, %entry ], [ %tmp2, %recurse ] + %tmp1 = icmp eq i32 %a.tr, 0 + br i1 %tmp1, label %done, label %recurse + +recurse: + %tmp2 = sub i32 %a.tr, 1 + br label %tailrecurse + +; Exit blocks +done: + %0 = add i32 %b, %a + ret i32 %0 +} +``` + +This IR looks basically like this C: + +```c +nat add(nat a, nat b) { + nat a0 = a; + while (a0 != 0) { + a0 -= 1; + } + return b + a; +} +``` + +If you're interested in more details of these loop optimizations, my knowledge here comes from [some very nice lecture notes][] linked from Regehr's blog post, go read that if you want to know more about how you actually detect these cases. + +[some very nice lecture notes]: https://www.cs.cmu.edu/~fp/courses/15411-f13/lectures/17-loopopt.pdf + +### Delete Dead Loops + +This pass is very straightforward. +The loop doesn't do anything anymore, and we know it will terminate, so we can just get rid of it. + +```llvm +define i32 @add(i32 %a, i32 %b) { +entry: + %0 = add i32 %b, %a + ret i32 %0 +} +``` + +And therefore, our code has been optimized down to: +```c +nat add(nat a, nat b) { + return b + a; +} +``` + +Our recursive definition of addition turns out to actually be addition, and LLVM has proved it for us! + +## Takeaways + +Very general optimizations can combine together to have some very surprising specific results, and optimizing compilers are very clever. + +These same optimizations work to optimize Peano multiplication, since the loop induction variables like to work with linear functions, but they don't succeed with saturating subtraction, recursive comparisons, or min/max. +It'll be interesting to see if I can come up with a loop optimization pass that can deal with those more complicated trip counts / induction variables in general at all, or if I'll only succeed at pattern matching these very specific functions. diff --git a/posts/2018/min-benchmark.md b/posts/2018/min-benchmark.md new file mode 100644 index 0000000..f4e9580 --- /dev/null +++ b/posts/2018/min-benchmark.md @@ -0,0 +1,100 @@ +--- +title: "Computing Min" +date: 2018-08-27 +# tags: rust; benchmarking +slug: computing-min +author: Cassie Jones +summary: "I benchmark a few different implementations of min, because that's fun." +--- + +Inspired by the ["gamedev wishlist for rust"][], I got curious if computing the minimum of a bunch of numbers with `min(min(min(a, b), c), d)` was effective. +My thinking was that this would produce unnecessary dependency chains in the processor, stopping out-of-order executions of independent `min`s. +Also, this was a good excuse to try out [Criterion][], so I set out to measure the impact. +One extra node + +["gamedev wishlist for rust"]: https://users.rust-lang.org/t/my-gamedever-wishlist-for-rust/2859 +[Criterion]: https://github.com/japaric/criterion.rs + +## Implementation + +In my actual benchmark I produced two copies of each of these methods specified here. +One for `std::cmp::min`, and one for `f32` (since it's not `Ord`). +For simplicity, I'll just use the generic one here, they both look pretty much the same. + +### Loopy + +First, I was curious if a usual `.iter().min()` would perform well. +The theory here is that *ideally*, for a known list length, if the compiler thought it was worthwhile, this would compile to the same code as a straight line of `min`. +So, our first case is this: + +```rust +[a, b, c, d, e].iter().min() +``` + +### Linear Reduction Macro + +The second method is a macro that will turn `min!(a, b, c, d, e)` into `min(a, min(b, min(c, min(d, e))))`. +This is a direct recursive macro that that just accumulates the `min` calls. +If you're familiar with Rust macros, nothing *too* scary is going on here. + +```rust +#[macro_export] +macro_rules! min { + ($x:expr $(,)*) => { $x }; + ($x:expr $(, $y:expr)* $(,)*) => { ::std::cmp::min($x, min!($($y),*)) }; +} +``` + +### Tree Reduction Macro + +This macro is quite hairy. +The goal is to turn something like `min_tree!(a, b, c, d, e)` into `min(min(a, min(c, e)), min(b, d))` in order to allow the processor to simultaneously execute the leaf `min` calls. +Let me walk us through the parts: + +First, we have the `()` case. +The `Ord` typeclass doesn't offer us a top element, so we just give an error if there are no arguments. +(The float version returns `f32::INFINITY` in this case.) + +Next, we have the base cases. +These look very similar to the cases from the `min!` macro, except that the n-element case calls the `@split` case. +The `@split` cases are dedicated to taking a list of expressions, and partitioning it into two different lists of expressions. +The idea being that if you can split it into two lists, then you can do `min_tree!` to each of those two lists. +The first `@split` case pulls two items off the arguments if they're available, and puts one in each accumulator list. +The second case is if there's only one argument left, and the final case is for when there are no arguments left. +Once the argument list has been split into two parts, we do `min(min_tree!(a...), min_tree!(b...))`, recursively constructing the tree. + +```rust +#[macro_export] +macro_rules! min_tree { + () => { compile_error!("Cannot compute the minimum of 0 elements") }; + ($x:expr) => { $x }; + ($x:expr, $y:expr) => { ::std::cmp::min($x, $y) }; + ($($x:expr),* $(,)*) => { min_tree!(@split []; []; $($x),*) }; + (@split [$($a:expr),*]; [$($b:expr),*]; $x:expr, $y:expr $(, $z:expr)*) => { + min_tree!(@split [$x $(, $a)*]; [$y $(, $b)*]; $($z),*) + }; + (@split [$($a:expr),*]; [$($b:expr),*]; $x:expr) => { + min_tree!(@split [$x $(, $a)*]; [$($b),*];) + }; + (@split [$($a:expr),*]; [$($b:expr),*];) => { + ::std::cmp::min(min_tree!($($a),*), min_tree!($($b),*)) + }; +} +``` + +## Results + +First of all, I was right, tree reduction is faster, at least for the 10-element `min` I was benchmarking. +This is imagining in the context of graphics applications, so we expect relatively small cases, often of a known size (like finding the minimum among 8 neighbors, for instance). +What was slightly more surprising to me was that for floats, the loop was faster than the linear reduction. +Looking at [godbolt output][] for a hardcoded case shows that they all get vectorized (and the loop gets unrolled), just with slightly different load scheduling. + +[godbolt output]: https://godbolt.org/z/e32CnV + +Criterion produces really cool graphs. +Here's the results from the two cases: + +![violin-i32](/images/2018-08-27-violin-i32.svg) +![violin-f32](/images/2018-08-27-violin-f32.svg) + +I suspect if you want to compute the minimum of a *very* large list, you'll benefit from doing tree reductions on independent chunks in a loop. -- 2.47.0