From 64e02126d2a1c03f056e54bc6e598dc8c1f1d797 Mon Sep 17 00:00:00 2001 From: Cassie Jones Date: Wed, 19 Feb 2020 00:24:56 +0100 Subject: [PATCH] Add syntactic unification post --- posts/2020/syntactic-unification.md | 261 ++++++++++++++++++++++++++++ 1 file changed, 261 insertions(+) create mode 100644 posts/2020/syntactic-unification.md diff --git a/posts/2020/syntactic-unification.md b/posts/2020/syntactic-unification.md new file mode 100644 index 0000000..2df5c69 --- /dev/null +++ b/posts/2020/syntactic-unification.md @@ -0,0 +1,261 @@ +--- +title: "Syntactic Unification Is Easy!" +date: 2020-02-19 +tags: +- unification +- logic programming +- algorithms +summary: "Last year I tried implementing syntactic unification and got completely stuck on the algorithms. Last week I read the MicroKanren paper and discovered that unification could be very simple. Let me show you how!" +--- + +Syntactic unification is the process of trying to find an assignment of variables that makes two terms the same. +For example, `(X, Y) = (1, 2)` would give the substitution `{X: 1, Y: 2}`, but `(X, X) = (1, 2)` would fail to unify. +Unification is used to implement two things I'm very interested in: type systems, and logic programming languages. + +Earlier last year I tried implementing unification in order to implement a toy prolog, and I got completely stuck on the algorithms. +I eventually puzzled throught it by reading lots of papers and articles, but I don't recall understanding it fully. +You can take a look at [the equations that the Wikipedia article uses][wiki] to understand what I was dealing with---it's certainly intimidating! + +Cut to last week, I happened to read [the MicroKanren paper] (which is very readable) and discovered that unification could be very simple! +Let me show you how! + +[wiki]: https://en.wikipedia.org/wiki/Unification_(computer_science)#A_unification_algorithm +[the microkanren paper]: http://webyrd.net/scheme-2013/papers/HemannMuKanren2013.pdf + +### The Terms To Unify + +We have to define what kinds of things we're going to unify. +There are a few basic things we *need* to have. + +- Variables, because otherwise unification is just equality comparison. +- Terms with multiple subterms, because otherwise unification is just. +- Other atomic "base" terms are good to have, because they let you write more interesting terms that are easily understandable. + +I'm going to write my examples in Python, so the types I'm going to work with are: + +- Variables: A `Var` class defined for this purpose. +- Composite terms: Tuples and lists of terms. +- Atomic terms: Any Python type that can be compared for equality. + +The scheme example they demonstrate in the MicroKanren article uses variables, pairs, and arbitrary objects. +The Rust implementation I made initially just has variables, pairs, and integers. +Exactly what you want to work with is completely up to you: hopefully at the end of this, you'll be able to figure out how to do it for some custom type. + +My `Var` type is very simple, with only one custom method to make the printing more compact: + +```python +from dataclasses import dataclass + +@dataclass(frozen=True) +class Var: + name: str + + def __repr__(self): + return self.name +``` + +And everything else is base Python types, so we're done with this part. + +### Unifying Terms + +Let's work through unifying things. +We want to unify two terms, and return a substitution that makes them equal, which I'll call the "environment." +If they can't possibly be equal, we'll return `None`. +Let's start out simple: two things can be unified if they're already equal. + +```python +def unify(a, b): + if a == b: + return {} + # ... other cases ... + return None +``` + +The variable cases can also be solved very easily. +Trying to unify a variable with another term, we can just substitute that for the variable. +When unifying `x = 1`, we get back `{ x: 1 }`. + +```python + if isinstance(a, Var): + return {a: b} + if isinstance(b, Var): + return {b: a} +``` + +Finally, let's handle tuples. +For this, we want to unify each individual component of the tuple. +We want the union of each of these unifications, so we'll update the dictionary each time. +We have to check if the sub-unification returned None and pass that along. + +```python + if isinstance(a, tuple) and isinstance(b, tuple): + if len(a) != len(b): + return None + env = {} + for (ax, bx) in zip(a, b): + res = unify(ax, bx) + if res is None: + return None + env.update(res) + return env +``` + +Now you might see where this could go wrong, and we'll get to that in a moment with an example. +Repeating that function all in one piece: + +```python +def unify(a, b): + if a == b: + return {} + if isinstance(a, Var): + return {a: b} + if isinstance(b, Var): + return {b: a} + if isinstance(a, tuple) and isinstance(b, tuple): + if len(a) != len(b): + return None + env = {} + for (ax, bx) in zip(a, b): + res = unify(ax, bx) + if res is None: + return None + env.update() + return env + return None +``` + +Now we can unify stuff: + +```python +>>> x, y = Var("x"), Var("y") +>>> unify(1, 1) +{} +>>> unify(x, 1) +{ x: 1 } +>>> unify((x, 2), (1, 2)) +{ x: 1 } +>>> unify((x, y), (1, 2)) +{ x: 1, y: 2 } +>>> unify((x, x), (1, 2)) +{ x: 2 } +``` + +Oops! +That last one is wrong! +So what can we do to fix that? + +Well, we need to take into account context. +Let's change our `unify` function to accept an environment to work inside. +I'm gonna make this a pure function, so we don't update the environment, we just make a new version of it. +Some things are changed a bit when doing this. +We include `env` in all of our outputs, instead of starting from nothing. +We'll also include + +```python +def unify(a, b, env={}): + if a == b: + return env + if isinstance(a, Var): + return {**env, a: b} + if isinstance(b, Var): + return {**env, b: a} + if isinstance(a, tuple) and isinstance(b, tuple): + if len(a) != len(b): + return None + for (ax, bx) in zip(a, b): + env = unify(a, b, env) + if env is None: + return None + return env + return None +``` + +This still doesn't do the right thing though, because we're not actually looking in the environment. +We need to look stuff up, so let's write a function to do that. +We'll just always call it, and if the term isn't a key for the environment, we'll just return it unchanged. + +```python +def walk(env, term): + if term in env: + term = term[env] + return term +``` + +Now we can add these two lines at the beginning. + +```python + a = walk(env, a) + b = walk(env, b) +``` + +With these modifications, we correctly handle the case we were stuck on earlier. + +```python +>>> unify((x, x), (1, 2)) +>>> # look ma no solution! +``` + +It's still not perfect though. +It doesn't properly reject this case where the conflict is two steps away: + +```python +>>> unify((x, y, x), (y, 8, 9)) +{ x: y, y: 9 } +``` + +Here the problem is that we have two facts around, that `x = y`, and that `y = 8`. +Even so, it doesn't see that that means `x = 8` when it gets to the assertion of `x = 9`. +Luckily this is another simple fix. +We just keep walking through the environment as long as we have a binding. +Just change the `if` to a `while`! + +```python +def walk(env, term): + while term in env: + term = term[env] + return term +``` + +And now this unifier should be able to handle any hard case you throw at it! + +### The Whole Unifier + +Here's the whole program we wrote in one piece, in just 30 lines. + +```python +from dataclasses import dataclass + +@dataclass(frozen=True) +class Var: + name: str + def __repr__(self): + return self.name + +def walk(env, term): + while term in env: + term = env[term] + return term + +def unify(a, b, env={}): + a = walk(env, a) + b = walk(env, b) + if a == b: + return env + if isinstance(a, Var): + return { **env, a: b } + if isinstance(b, Var): + return { **env, b: a } + if isinstance(a, tuple) and isinstance(b, tuple): + if len(a) != len(b): + return None + for (a, b) in zip(a, b): + env = unify(a, b, env) + if env is None: + return None + return env + return None +``` + +Consider writing your own in your language of choice, or adding support for new datatypes. +It would be nice to be able to unify namedtuples and dictionaries and things, or do unification in the web browser. +Try it out for yourself, because next time I'll be talking about how you can go from here and use unification to build logic programs. -- 2.47.0