Blogpost 💬 Equality, Dependent Types, and Typechecking (10/23/2023)
This post assumes some experience with programming in dependently typed languages. Also, I'll be using Idris syntax in this post.
At some point when programming with dependent types, a beginner might have some code very much like the following
someVec= -- Some code here to generate a vector
We have a vector of a fixed but unknown length
n. Out of curiosity, they might try something else:
sameVecDifferentType: Vec (0 +
Program typechecked successfully.
The beginner then thinks:
Cool! The typechecker knows that
0 +, and so it allows me to assign use a vector of length
nwherever a vector of length
0 +is expected!
Excited by this observation, they might then try something a little different:
nextThing: Vec (
n+ 0) String
Type Error: Expected vector of length `
n+ 0`, got vector of length `
n` (Cannot prove `
Well that's odd. Why doesn't this work?
What this beginner has run up against is one of the main design problems in dependently typed programming languages: Equality. More specifically the difference between what equalities the typechecker can solve on its own, and which it requires user assistance to figure out.
The first thing anybody asks is why:
Why does the typechecker need my assistance with figuring out that
is true, but can figure out
n+ 0 =
0 +on its own?
Good question! Let's ask ourselves another question to get us to the answer: Why can't the typechecker just figure out all equalities?
Unfortunately it's been mathematically proven that the problem is undecidable - in other words, the problem of solving all equalities in general is equivalent to the Halting problem, meaning it's impossible to solve. The best we can do is create many algorithms that each handle a portion of the problem domain. There's algorithms for handling equations of linear arithmetic, boolean equations, etc, but we can't ever invent an algorithm that gives a correct yes/no answer to any equation that comes up in a typical dependently typed language.
One might think that one could simply compute the terms as far as possible and then check to see if they have exactly the same syntax tree. For instance, we could have the problem
reallyBigAndComplicatedTerm = anotherComplicatedTerm, but the problem eventually reduces to
0 = 1 after computing the terms as far as possible, and this equality is then an obvious one that can be solved automatically.
A problem with this approach is that the terms might contain infinite loops. If we have
loopingTerm = 1, we would evaluate
loopingTerm forever, never getting to the point where we could compare
1 to anything. The typical solution to this is to add termination checking, which employs various heuristics to check that both terms don't contain such infinite loops. The problem of fully accurate termination checking though - termination checking that always gives a correct yes/no answer to whether a term terminates - is also undecidable! This is a valid design decision, but we're again out of luck if we're looking for an algorithm that lets us write any program while also accurately verifying any equation.
We can't create an algorithm that can handle all possible equations. This also answers our original question: The typechecker can't handle certain equations because at some point the language designer made the choice not to allow it to. This happens all the time when implementing dependently typed languages, with reasons including:
Let's see how addition is defined Note: I'm assuming familiarity with Peano numbers :
add: Int -> Int -> Int
addx y = case
xof 0 =>
yS z =>
Addition pattern matches on the first argument, and based only on that decides how to continue computing. If the first argument is something like a variable, the function doesn't know how to continue computing - it gets "stuck". So
0 + directly computes to
because the first argument is a concrete value, but
doesn't compute because pattern match gets "stuck" on
n + 0
In this hypothetical language we're using that can't automatically figure out
, the designer made the choice to implement exactly "evaluate the two terms as far as possible, and then check if their syntax trees are the exact same". Again, this choice could have been made for any of the many reasons listed above, or more.
n + 0 =
The key point is that equality is a design question! The algorithms that language designers implement all have their own tradeoffs, advantages, and difficulties, and why an equality is or isn't automatically solvable almost always has a complex answer. When designing a language, make equality a conscious part of your design rather than an afterthought. And in programming, when you encounter a frustrating case where the language can't do enough for you, explore the design decisions that led to that problem!