Böhm's Theorem and the Böhm-Out Algorithm

Definitions

-- Church-encoded datatypes

{- UNIT -}
unit = \x. x

{- ERROR -}
error = \_. error

{- BOOL -}
true = \t. \f. t
false = \t. \f. f
then = unit -- dummy definitions
else = unit -- ^^^^^
-- make 'if' actually take 5 args, including 'then' and 'else':
if b then t else f = b t f
and a b = if a then b else false
or a b = if a then true else b
not a = if a then false else true

{- PAIR -}
pair a b = \f. f a b
fst p = p (\a. \b. a)
snd p = p (\a. \b. b)

{- MAYBE -}
none = \s. \n. n
some a = \s. \n. s a

{- NAT -}
zero = \s. \z. z
succ n = \s. \z. s (n s z)

pred n = fst (n (\p. pair (snd p) (succ (snd p))) (pair zero zero))

iszero n = n (\_. false) true

add m n = n succ m
sub m n = n pred m
mul m n = n (add m) 0
exp m n = n (mul m) 1

even n = n not true
odd n = n not false

equal m n =
  if (iszero m) then
    (iszero n)
  else
    (and (not (iszero n)) (equal (pred m) (pred n)))

0 = zero
1 = succ 0
2 = succ 1
3 = succ 2
4 = succ 3
5 = succ 4
6 = succ 5
7 = succ 6
8 = succ 7
9 = succ 8
10 = succ 9
100 = mul 10 10
1000 = mul 10 100
inf = succ inf

{- LIST -}
nil = \c. \n. n
cons h t = \c. \n. c h (t c n)
head xs = xs (\h t. h) error
tail xs = fst (xs (\h p. pair (snd p) (cons h (snd p))) (pair nil nil))
singleton h = cons h nil
append xs ys = xs cons ys
-- pushes a new element to the back of a list
snoc as a = append as (singleton a)
reverse xs = xs (\ x cont acc. cont (cons x acc)) (\xs'. xs') nil
map f xs = xs (\x xs. cons (f x) xs) nil
length xs = xs (\h. succ) zero
repeat n x = n (cons x) nil
cycle xs = append xs (cycle xs)
take n xs = n (\rec xs. cons (head xs) (rec (tail xs))) (\xs. nil) xs
drop n xs = n tail xs

Evaluation order: strict lazy
Reduce to: normal form head normal form show steps

Böhm's Theorem

Any two normal forms are either η-convertible or separable [1].

By separable, it means that there exists a discriminator function Δ such that for normal forms u and v, (Δ u) evaluates to (λt f. t) and (Δ v) evaluates to (λt f. f).

Not only does a Δ exist, but there is a constructive algorithm called the Böhm-out technique that gives us a concrete Δ! We will loosely follow [2].

Path Construction

The first step is to find a difference between the two normal forms, if one exists. For u = λx₁...x_m. (g a₁...a_p) and v = λy₁...y_n. (h b₁...b_q), we first η-expand the lesser until m = n, then α-rename y₁...y_n to x₁...x_n so that their lambda-bound variables are the same. These transformations preserve the semantics of u and v, so a difference path on the transformed terms is a difference path on the originals too. Now, they can be different in any of 3 ways:

The head is different, g ≠ h
The number of args is different, p ≠ q
Or, the difference is nested somewhere in the ith arg, a_i ≠ b_i

The result of this search is a list of indices c_i denoting a path through the c₁th arg, then the c₂th arg of that term, and so on. The path terminates with a difference in either the head var or the number of args. We construct Δ by induction on this path.

Discriminator Construction

By induction on the difference path for normal forms u and v. Note that as above, we assume for each case that u and v have already been η-expanded and α-renamed to bind the same variables.

Case: head difference.
Given terms u = λx₁...x_n. (x_g a₁...a_p) and v = λx₁...x_n. (x_h b₁...b_q), we know g ≠ h. If we apply λx₁...x_p t f. t as the arg for x_g, λx₁...x_q t f. t as the arg for x_h, and any other term (e.g. λx.x) for the rest of x₁...x_n, then applying these args to u yields (λt f. t) and to v yields (λt f. f).

Case: number of args difference.
Given terms u = λx₁...x_n. (x_h a₁...a_p) and v = λx₁...x_n. (x_h b₁...b_q), we know p ≠ q. Without loss of generality, assume p < q. By applying λy₁...y_q+1. y_q+1 as the arg for x_h and λx.x for the rest, we get
(λx₁...x_h...x_n. (x_h a₁...a_p)) (λx.x)...(λy₁...y_q+1. y_q+1)...(λx.x)

⇒_β (λy₁...y_q+1. y_q+1) a₁[x₁↦λx.x,...]...a_p[x₁↦λx.x,...]
⇒_β λy_p+1...y_q+1. y_q+1
⇒_α λz₁...z_q-p+1. z_q-p+1

and
(λx₁...x_h...x_n. (x_h b₁...b_q)) (λx.x)...(λy₁...y_q+1. y_q+1)...(λx.x)

⇒_β (λy₁...y_q+1. y_q+1) b₁[x₁↦λx.x,...]...b_q[x₁↦λx.x,...]
⇒_β λy_q+1. y_q+1
⇒_α λz₁. z₁
⇒_η λz₁...z_q-p+1. (z₁ z₂...z_q-p+1)

which is a head difference since p < q, so we can conclude by using the first case above!

Case: difference in ith arg.
Given terms

u = λx₁...x_n.
    (x_h a₁...a_i-1 b a_i+1...a_p)

and

v = λx₁...x_n.
    (x_h a₁...a_i-1 c a_i+1...a_p)

, we have a path P to a difference between b and c. If x_h does not occur along P except here, then this is easy! We can simply apply λy₁...y_p. y_i as the arg for to the head var x_h, and recursively apply this process with the rest of P and λx₁...x_h-1 x_h+1...x_n. b and λx₁...x_h-1 x_h+1...x_n. c.
Things are a little more complicated if x_h occurs at the head several times on a path. This is because right now, the difference is buried inside the ith arg of x_h, but somewhere inside that arg could be another term with x_h as the head, but with the difference somewhere inside a different arg than i. So if have x_h only select its ith arg, then we could lose the difference later on. To avoid this happening, first determine the maximum number of args M applied to a head occurrence of x_h along P. Then, η-expand all head occurrences of x_h in u and v along P until all have exactly M args. Then, use (λx₁...x_M+1. (x_M+1 x₁...x_M)) as the argument for x_h, so that all
(x_h a₁...a_M)

⇒_β (λx₁...x_M+1. (x_M+1 x₁...x_M)) a₁...a_M
⇒_β λx_M+1. x_M+1 a₁...a_M

which introduces a new, distinct head variable for each head occurrence of x_h. We can procede with the simpler case above, because this new head only occurs once along the path.