]> Witch of Git - web/blog/blob - posts/2020/syntactic-unification.md
Add syntactic unification post
[web/blog] / posts / 2020 / syntactic-unification.md
1 ---
2 title: "Syntactic Unification Is Easy!"
3 date: 2020-02-19
4 tags:
5 - unification
6 - logic programming
7 - algorithms
8 summary: "Last year I tried implementing syntactic unification and got completely stuck on the algorithms. Last week I read the MicroKanren paper and discovered that unification could be very simple. Let me show you how!"
9 ---
10
11 Syntactic unification is the process of trying to find an assignment of variables that makes two terms the same.
12 For example, `(X, Y) = (1, 2)` would give the substitution `{X: 1, Y: 2}`, but `(X, X) = (1, 2)` would fail to unify.
13 Unification is used to implement two things I'm very interested in: type systems, and logic programming languages.
14
15 Earlier last year I tried implementing unification in order to implement a toy prolog, and I got completely stuck on the algorithms.
16 I eventually puzzled throught it by reading lots of papers and articles, but I don't recall understanding it fully.
17 You can take a look at [the equations that the Wikipedia article uses][wiki] to understand what I was dealing with---it's certainly intimidating!
18
19 Cut to last week, I happened to read [the MicroKanren paper] (which is very readable) and discovered that unification could be very simple!
20 Let me show you how!
21
22 [wiki]: https://en.wikipedia.org/wiki/Unification_(computer_science)#A_unification_algorithm
23 [the microkanren paper]: http://webyrd.net/scheme-2013/papers/HemannMuKanren2013.pdf
24
25 ### The Terms To Unify
26
27 We have to define what kinds of things we're going to unify.
28 There are a few basic things we *need* to have.
29
30 - Variables, because otherwise unification is just equality comparison.
31 - Terms with multiple subterms, because otherwise unification is just.
32 - Other atomic "base" terms are good to have, because they let you write more interesting terms that are easily understandable.
33
34 I'm going to write my examples in Python, so the types I'm going to work with are:
35
36 - Variables: A `Var` class defined for this purpose.
37 - Composite terms: Tuples and lists of terms.
38 - Atomic terms: Any Python type that can be compared for equality.
39
40 The scheme example they demonstrate in the MicroKanren article uses variables, pairs, and arbitrary objects.
41 The Rust implementation I made initially just has variables, pairs, and integers.
42 Exactly what you want to work with is completely up to you: hopefully at the end of this, you'll be able to figure out how to do it for some custom type.
43
44 My `Var` type is very simple, with only one custom method to make the printing more compact:
45
46 ```python
47 from dataclasses import dataclass
48
49 @dataclass(frozen=True)
50 class Var:
51 name: str
52
53 def __repr__(self):
54 return self.name
55 ```
56
57 And everything else is base Python types, so we're done with this part.
58
59 ### Unifying Terms
60
61 Let's work through unifying things.
62 We want to unify two terms, and return a substitution that makes them equal, which I'll call the "environment."
63 If they can't possibly be equal, we'll return `None`.
64 Let's start out simple: two things can be unified if they're already equal.
65
66 ```python
67 def unify(a, b):
68 if a == b:
69 return {}
70 # ... other cases ...
71 return None
72 ```
73
74 The variable cases can also be solved very easily.
75 Trying to unify a variable with another term, we can just substitute that for the variable.
76 When unifying `x = 1`, we get back `{ x: 1 }`.
77
78 ```python
79 if isinstance(a, Var):
80 return {a: b}
81 if isinstance(b, Var):
82 return {b: a}
83 ```
84
85 Finally, let's handle tuples.
86 For this, we want to unify each individual component of the tuple.
87 We want the union of each of these unifications, so we'll update the dictionary each time.
88 We have to check if the sub-unification returned None and pass that along.
89
90 ```python
91 if isinstance(a, tuple) and isinstance(b, tuple):
92 if len(a) != len(b):
93 return None
94 env = {}
95 for (ax, bx) in zip(a, b):
96 res = unify(ax, bx)
97 if res is None:
98 return None
99 env.update(res)
100 return env
101 ```
102
103 Now you might see where this could go wrong, and we'll get to that in a moment with an example.
104 Repeating that function all in one piece:
105
106 ```python
107 def unify(a, b):
108 if a == b:
109 return {}
110 if isinstance(a, Var):
111 return {a: b}
112 if isinstance(b, Var):
113 return {b: a}
114 if isinstance(a, tuple) and isinstance(b, tuple):
115 if len(a) != len(b):
116 return None
117 env = {}
118 for (ax, bx) in zip(a, b):
119 res = unify(ax, bx)
120 if res is None:
121 return None
122 env.update()
123 return env
124 return None
125 ```
126
127 Now we can unify stuff:
128
129 ```python
130 >>> x, y = Var("x"), Var("y")
131 >>> unify(1, 1)
132 {}
133 >>> unify(x, 1)
134 { x: 1 }
135 >>> unify((x, 2), (1, 2))
136 { x: 1 }
137 >>> unify((x, y), (1, 2))
138 { x: 1, y: 2 }
139 >>> unify((x, x), (1, 2))
140 { x: 2 }
141 ```
142
143 Oops!
144 That last one is wrong!
145 So what can we do to fix that?
146
147 Well, we need to take into account context.
148 Let's change our `unify` function to accept an environment to work inside.
149 I'm gonna make this a pure function, so we don't update the environment, we just make a new version of it.
150 Some things are changed a bit when doing this.
151 We include `env` in all of our outputs, instead of starting from nothing.
152 We'll also include
153
154 ```python
155 def unify(a, b, env={}):
156 if a == b:
157 return env
158 if isinstance(a, Var):
159 return {**env, a: b}
160 if isinstance(b, Var):
161 return {**env, b: a}
162 if isinstance(a, tuple) and isinstance(b, tuple):
163 if len(a) != len(b):
164 return None
165 for (ax, bx) in zip(a, b):
166 env = unify(a, b, env)
167 if env is None:
168 return None
169 return env
170 return None
171 ```
172
173 This still doesn't do the right thing though, because we're not actually looking in the environment.
174 We need to look stuff up, so let's write a function to do that.
175 We'll just always call it, and if the term isn't a key for the environment, we'll just return it unchanged.
176
177 ```python
178 def walk(env, term):
179 if term in env:
180 term = term[env]
181 return term
182 ```
183
184 Now we can add these two lines at the beginning.
185
186 ```python
187 a = walk(env, a)
188 b = walk(env, b)
189 ```
190
191 With these modifications, we correctly handle the case we were stuck on earlier.
192
193 ```python
194 >>> unify((x, x), (1, 2))
195 >>> # look ma no solution!
196 ```
197
198 It's still not perfect though.
199 It doesn't properly reject this case where the conflict is two steps away:
200
201 ```python
202 >>> unify((x, y, x), (y, 8, 9))
203 { x: y, y: 9 }
204 ```
205
206 Here the problem is that we have two facts around, that `x = y`, and that `y = 8`.
207 Even so, it doesn't see that that means `x = 8` when it gets to the assertion of `x = 9`.
208 Luckily this is another simple fix.
209 We just keep walking through the environment as long as we have a binding.
210 Just change the `if` to a `while`!
211
212 ```python
213 def walk(env, term):
214 while term in env:
215 term = term[env]
216 return term
217 ```
218
219 And now this unifier should be able to handle any hard case you throw at it!
220
221 ### The Whole Unifier
222
223 Here's the whole program we wrote in one piece, in just 30 lines.
224
225 ```python
226 from dataclasses import dataclass
227
228 @dataclass(frozen=True)
229 class Var:
230 name: str
231 def __repr__(self):
232 return self.name
233
234 def walk(env, term):
235 while term in env:
236 term = env[term]
237 return term
238
239 def unify(a, b, env={}):
240 a = walk(env, a)
241 b = walk(env, b)
242 if a == b:
243 return env
244 if isinstance(a, Var):
245 return { **env, a: b }
246 if isinstance(b, Var):
247 return { **env, b: a }
248 if isinstance(a, tuple) and isinstance(b, tuple):
249 if len(a) != len(b):
250 return None
251 for (a, b) in zip(a, b):
252 env = unify(a, b, env)
253 if env is None:
254 return None
255 return env
256 return None
257 ```
258
259 Consider writing your own in your language of choice, or adding support for new datatypes.
260 It would be nice to be able to unify namedtuples and dictionaries and things, or do unification in the web browser.
261 Try it out for yourself, because next time I'll be talking about how you can go from here and use unification to build logic programs.