Add support for some arithmetic built-ins This adds support for addition, subtraction, and comparisons. All of these built-ins are written as definitions in assembly. Currently the true and false functions are included unconditionally in compilation, and have their numbers hard-coded.
Add CFI directives CFI is the "Call Frame Information", it provides instructions about where the call frame is. Since we're not emitting a frame pointer, we need some way to tell debuggers where our call frames our. These CFI directives let you provide instructions to find the beginning of the call frame. In our case, we just want to tell it what offsets from the stack pointer it needs to find it. These cause the assembler to put all of this metadata in a separate "eh" frame info section that tools can inspect, it doesn't modify the actual code. Adding these makes using the "up" and "down" commands in the debugger more reliable when looking at my emitted functions. Previously they would sometimes fail to figure out where they were, making debugging annoying.
Avoid redundant loads in capture generation When generating the SSA form for captures, the previous code generated loads of all the variables that existed. This worked correctly for some cases, but since it skipped the system in trans_expr that forwards loads, it would sometimes generate loads to SSA variables that could just be directly referenced instead, which would end up causing loads from stack slots that were never written to due to optimizations. Using trans_expr on the captured variable instead of generating loads directly lets us reuse that capture logic, and it ends up making the code simpler :D
Load numbers directly instead of via stack Assembly output for programs that used number literals was getting hard to read because it indirected everything through the stack, directly loading numbers in programs instead makes things behave much better. This works via the same approach as the previous load forwarding.
Implement variable capture We just copy the variables into the closure environment after it gets allocated. This involves a decent chunk of stack traffic because we have to increment their reference counts. In the process, I figured out that the "ivy_app_mut" optimization for later functions isn't sound in the presence of nested functions, because one of the earlier applications can return a shared function which isn't acceptable to mutate. (There are ways to make it sound if you know for sure what functions are being called).
Reduce stack traffic by finding loads through SSA Instead of loading Load instructions directly and then storing them onto the stack, we perform loads by looking through the SSA and seeing what instruction generated them. If it was a load, we run that load directly, otherwise we load from the stack slot that it corresponds to.
Modify runtime symbol decoration C name decoration varies across platforms. On macOS, names from C are prefixed with an underscore. On Linux, they're not, and on 64-bit Windows they're not. The macOS situation observed by inspecting the symbols. Linux compatibility issue discovered here (and above in the thread): https://twitter.com/16kbps/status/1233955883148861440 Windows symbol decoration documented here: https://docs.microsoft.com/en-us/cpp/build/reference/decorated-names#FormatC > Note that in a 64-bit environment, functions are not decorated. Currently handling this with conditional compilation, but that prevents cross-compilation. In the future it would be good to have the formatting method select how to print them, but that would involve switching away from fmt::Display, since we can't pass the ABI information into that. Alternately, we could pack the ABI information into the symbol enum.
Add a compiler CLI The compiler CLI lets you output assembly, the pretty-printed code, or a compiled binary. It shells out to clang for assembling. You have to have clang installed, and you have to have the runtime library in the correct place. I need to figure out a way to embed the runtime library or something...
Add an extensible framework for adding built-ins Now builtins can be written in assembly at the bottom, and they'll be included and registered in the entry point on-demand by the program. Only functions that are referenced will be included in the output. This will need to be extended to handle functions which reference other functions (like the handling of "true" and "false" if any of the comparisons are included), but that's a matter of front-end changes.
Implement basic code generation This commit makes it possible to compile and assemble sixty-four.vy, and compute the correct result!!!!!!! In order to avoid having to do any detailed analysis on variable usage, this currently accesses everything via the stack. This means allocating stack space for every instruction, and then saving and loading everything from the stack frame. The translation of individual functions is pretty direct. Due to going via the stack for everything, we can translate each individual instruction on its own, in order, without looking at other instructions. This currently doesn't implement copying data into the closures, because the current test case doesn't do that at all. This also hard-codes the one builtin (debug) that's referred to by the test program.
Add skeleton of x64 compilation code We compile the globals and functions in the program into a list of global labels and definitions. Add a bunch of instruction and etc. types here for formatting. Then, assemble all of this together in the program. The program realigns the stack (probably overkill?) and then calls into the entry point symbol, then exits. I added ivy_exit so that I wouldn't have to refer to any platform-specific symbols in the assembly that's emitted, all of that can be handled by the runtime. Unfortunately there will still be platform specific behavior because of different calling conventions.
Add the target-independent trans module This handles translation of the AST to an SSA form. Currently not doing any control flow constructs inside a function, so functions just have a single basic block with linear SSA. This does some basic load forwarding and de-duplication optimizations since they're straightforward. The most important effect here is translating the globally resolved names into offsets into the closure record, and listing the variables that need to be copied into constructed closures.
Add support for globals which don't get captured Lots of top-level definitions don't need to be copied into every closure, because they're constant for the whole program. Global variables are never inserted into the upvars entry, so they won't be copied into closures. This also adds "builtin" names that are inserted into the initial context for arithmetic and comparisons. Booleans have to be builtin due to being returned by the comparisons, but they're still intended to be implemented via church encoding.