From 8a8b5168cfa2b2b3ec3973ca996d758dfe8a763c Mon Sep 17 00:00:00 2001 From: Cassie Jones Date: Wed, 19 Feb 2020 16:12:31 +0100 Subject: [PATCH] Add a section on the occurs-check --- posts/2020/syntactic-unification.md | 78 +++++++++++++++++++++++++++-- 1 file changed, 74 insertions(+), 4 deletions(-) diff --git a/posts/2020/syntactic-unification.md b/posts/2020/syntactic-unification.md index 2df5c69..4fc9e14 100644 --- a/posts/2020/syntactic-unification.md +++ b/posts/2020/syntactic-unification.md @@ -12,9 +12,9 @@ Syntactic unification is the process of trying to find an assignment of variable For example, `(X, Y) = (1, 2)` would give the substitution `{X: 1, Y: 2}`, but `(X, X) = (1, 2)` would fail to unify. Unification is used to implement two things I'm very interested in: type systems, and logic programming languages. -Earlier last year I tried implementing unification in order to implement a toy prolog, and I got completely stuck on the algorithms. +Earlier last year I tried implementing unification in order to make a toy Prolog, and I got completely stuck on the algorithms. I eventually puzzled throught it by reading lots of papers and articles, but I don't recall understanding it fully. -You can take a look at [the equations that the Wikipedia article uses][wiki] to understand what I was dealing with---it's certainly intimidating! +You can take a look at [the equations that the Wikipedia article uses][wiki] to see what I was dealing with---it's certainly intimidating! Cut to last week, I happened to read [the MicroKanren paper] (which is very readable) and discovered that unification could be very simple! Let me show you how! @@ -38,7 +38,7 @@ I'm going to write my examples in Python, so the types I'm going to work with ar - Atomic terms: Any Python type that can be compared for equality. The scheme example they demonstrate in the MicroKanren article uses variables, pairs, and arbitrary objects. -The Rust implementation I made initially just has variables, pairs, and integers. +The Rust version I wrote just has variables, pairs, and integers. Exactly what you want to work with is completely up to you: hopefully at the end of this, you'll be able to figure out how to do it for some custom type. My `Var` type is very simple, with only one custom method to make the printing more compact: @@ -216,7 +216,7 @@ def walk(env, term): return term ``` -And now this unifier should be able to handle any hard case you throw at it! +And now this unifier should be able to handle any [(finite)](#postscript-unifying-infinity) hard cases you throw at it! ### The Whole Unifier @@ -259,3 +259,73 @@ def unify(a, b, env={}): Consider writing your own in your language of choice, or adding support for new datatypes. It would be nice to be able to unify namedtuples and dictionaries and things, or do unification in the web browser. Try it out for yourself, because next time I'll be talking about how you can go from here and use unification to build logic programs. + +### Postscript: Unifying Infinity + +What was that earlier? +About only the finite cases? +Well, it turns out this unifier is good enough for implementing well-behaved logic programs, but it leaves out one complexity: the "Occurs Check." +This checks that a variable does not "occur" inside a term you unify it with, to avoid infinite solutions and non-termination. +Many old Prolog implementations left out the occurs check for efficiency reasons, and we can leave it out for simplicity reasons if we want to trust our programmers to only write finite terms. + +But let's go look at the problem specifically. +We can write a unification like: + +```python +>>> unify(a, (1, a)) +{ a: (1, a) } +``` + +If you think through this, you can see that `a = (1, (1, (1, (1, ...))))`, going on infinitely deep. +There are languages that work with logic on infinite terms like this, but applied naively it can end up giving us incorrect results or non-termination. +For instance, with proper support for infinite terms, the following should unify: + +```python +>>> env = unify(a, (1, a)) +>>> env = unify(b, (1, b), env) +>>> env = unify(a, b) +``` + +But instead, it gets stuck on the last step, descending forever. +There are two ways to solve this. +One is to embrace infinite terms and include fixpoints in your logic, giving you terms like `a = fix x. (1, x)` and `b = fix x. (1, x)`. +I don't know how to do this, though. + +The other way is to detect unifications that would lead to infinite terms, and reject them. +We take our code that adds variables to the substitution, and add the occurs check there: + +```python + if isinstance(a, Var): + if occurs(a, b, env): return None + return { **env, a: b } +``` + +If we're using the occurs-check here, then we know a few things: + +- `a` is a variable that's already been `walk()`-ed. +- `a` and `b` aren't equal to each other, because earlier in the unify checked that. + +This means that we can check if the literal variable `a` occurs literally in walked terms and subterms of `b`. +If there's a variable in `b` that's a synonym of `a`, then it will walk to `a` because it was the result of walking. +Then, the implementation looks like this: + +```python +def occurs(a, b, env={}): + b = walk(env, b) + if a == b: + return True + if isinstance(b, tuple): + return any(occurs(a, x) for x in b) + return False +``` + +Now if we try... + +```python +>>> unify(a, (1, a)) +>>> +``` + +We correctly don't unify this. +We've got sound unification over finite terms. +This is useful to clean the situtation up, but I'm still a bit sad to see infinite terms go... -- 2.47.0