Any two normal forms are either η-convertible or separable [1].
By separable, it means that there exists a discriminator function Δ such that for normal forms
u and v, (Δ u) evaluates to (λt f. t)
and (Δ v) evaluates to (λt f. f).
Not only does a Δ exist, but there is a constructive algorithm
called the Böhm-out technique that gives us a concrete Δ!
We will loosely follow [2].
Path Construction
The first step is to find a difference between the two normal forms, if one exists.
For u = λx1...xm. (g a1...ap)
and v = λy1...yn. (h b1...bq),
we first η-expand the lesser until m = n,
then α-rename y1...yn to x1...xn
so that their lambda-bound variables are the same.
These transformations preserve the semantics of u and v,
so a difference path on the transformed terms is a difference path on the originals too.
Now, they can be different in any of 3 ways:
The head is different, g ≠ h
The number of args is different, p ≠ q
Or, the difference is nested somewhere in the ith arg, ai ≠ bi
The result of this search is a list of indices ci
denoting a path through the c1th arg,
then the c2th arg of that term, and so on.
The path terminates with a difference in either the head var or the number of args.
We construct Δ by induction on this path.
Discriminator Construction
By induction on the difference path for normal forms u and v.
Note that as above, we assume for each case that u and v have
already been η-expanded and α-renamed to bind the same variables.
Case: head difference.
Given terms
u = λx1...xn. (xg a1...ap) and
v = λx1...xn. (xh b1...bq),
we know g ≠ h.
If we apply λx1...xp t f. t as the arg for xg,
λx1...xq t f. t as the arg for xh,
and any other term (e.g.λx.x) for the rest of x1...xn,
then applying these args to u yields (λt f. t) and to v yields (λt f. f).
Case: number of args difference.
Given terms
u = λx1...xn. (xh a1...ap) and
v = λx1...xn. (xh b1...bq),
we know p ≠ q.
Without loss of generality, assume p < q.
By applying λy1...yq+1. yq+1
as the arg for xh and λx.x for the rest, we get
(λx1...xh...xn. (xh a1...ap)) (λx.x)...(λy1...yq+1. yq+1)...(λx.x)
which is a head difference since p < q, so we can conclude by using the first case above!
Case: difference in ith arg.
Given terms
u = λx1...xn.
(xh a1...ai-1 b ai+1...ap) and
v = λx1...xn.
(xh a1...ai-1 c ai+1...ap),
we have a path P to a difference between b and c.
If xh does not occur along P except here, then this is easy!
We can simply apply λy1...yp. yi as the arg for to the head var xh, and recursively apply this process with the rest of P and λx1...xh-1 xh+1...xn. b and λx1...xh-1 xh+1...xn. c.
Things are a little more complicated if xh occurs at the head several times on a path.
This is because right now, the difference is buried inside the ith arg of xh,
but somewhere inside that arg could be another term with xh as the head, but with the
difference somewhere inside a different arg than i.
So if have xh only select its ith arg,
then we could lose the difference later on.
To avoid this happening, first determine the maximum number of args M applied
to a head occurrence of xh along P.
Then, η-expand all head occurrences of xh in u and v along P
until all have exactly M args.
Then, use (λx1...xM+1. (xM+1 x1...xM)) as
the argument for xh, so that all
(xh a1...aM)
which introduces a new, distinct head variable for each head occurrence of xh.
We can procede with the simpler case above, because this new head only occurs once along the path.