Thinking as the Crow Flies: Part 3 - Tokens, Syntax, and Expressionspost by Anthony Hart (anthony-hart) · 2017-12-18T23:38:52.106Z · LW · GW · None comments
In my last post, I discussed some basic logical connectives. Before moving on to more advanced topics, I want to discuss formal syntax. I think many people will find this post dry and boring, but I think it's neccessary to cover at some poin
Up till this point, I've been assuming an understanding which bears dwelling on. I've previously stated that a judgment is a series of mental tokens which one may declare. While I've technically adhered to that, it's obvious that I'm assuming more structure. Furthermore, I've made usage of variable binding and substitution, which have not been properly accounted for.
This short post is intended to specify the nature of syntax as used in this series. It will mostly be a recount of abstract syntax and binding according to the first chapter of Practical Foundations for Programming Languages, but with retrospect from the account of meaning explanations given earlier.
Abstract Syntax Trees
Any given form of logic specifies various sorts of phrases which may be combined to form judgments. We'll distinguish between two different types of syntax, an informal "surface" syntax which is human readable, and a formal abstract syntax which is more well-defined. We'll be concerning ourselves with this abstract syntax in this post, treating the surface syntax as a mere readable short-hand for the abstract syntax.
In defining abstract syntax, we concern ourselves primarily with the rules by which we may combine phrases with each other. Since we already have an understanding of meaning explanation under our belt, we may use this to express the rules for correct syntax formation. At first, our phrases will be in the form of trees, called abstract syntax trees, or ASTs.
ASTs are made up of two main kinds of things, variables and operators. The variables may be placed at the leaves of an AST, while the operators make up the nodes. For the sake of restricting how syntax may be combined, we give every operator a sort. This sort will dictate the role of an operator by restricting how it may be combined with other operators.
Each operator comes equipped with a list of sorts, , ... , which tell us what sorts of ASTs can be plugged into the operator. In addition, each operator has an intrinsic sort . When ASTs of the proper sorts are plugged into this operator, we obtain an AST of sort . We denote the full arity .
Variables will be the most significant part of our discussion of abstract syntax. Briefly, a variable is a single token which is intended to be used as a marker for a place where a more concrete expression might be placed. As a result, expressions containing variables make up schemes of statements which may be specialized, via substitution, into more specific statements. For example, in basic algebra, we may form polynomials such as , which can be specialized by substitution of, say, for to obtain .
Abstract syntax trees are classified by sorts that divide ASTs into syntactic categories. As an example, the various true, prop, , etc. operators will be placed into a different sort than , , , etc. We would not have tried stating, for example, true true. Beyond not meaning anything, we will make this a syntactically incorrect statement.
Variables in abstract syntax trees range over sorts in the sense that only ASTs of the specified sort of the variable can be plugged into that variable. But the core idea carries over from school mathematics, namely that a variable is an unknown, or a place-holder, whose meaning is given by substitution.
As an example, consider a language of arithmetic expressions built from numbers, addition, and multiplication. The abstract syntax of such a language consists of a single sort Exp (standing for "expression") generated by the following:
An operator of arity .
An operator of arity .
An operator num of arity .
Operators plus and times, both of arity .
As an example, the expression , which involves a variable, , would be represented by the AST of sort Exp, under the assumption that is also of sort Exp. For the sake of readability, all numbers will simply be written as numerals from now on. Because, say, num(), is an AST of sort Exp, we may plug it in for in the above AST to obtain, which is written in informal, surface syntax as . We may, of course, plug in any more complex ASTs of sort Exp for instead.
We may give a meaning explanation for ASTs as follows;
We may declare to be an AST of sort when we've declared to be a variable of sort .
We may declare to be an AST of sort when we've declared to be an operator of arity and have declared, for each , to be an AST of sort .
To fully explicate the system above, we have a meaning explanation like;
We may declare Exp to be a sort.
We may declare to be a sort.
We may declare an operator of arity
and this will fully specify our syntax.
Implicitly, for all the theories we've discussed so far, the exact abstract syntax is left implicit. But we may employ the following explanation;
We may declare Judgment to be a sort.
We may declare Proposition to be a sort.
We may declare hypothetical to be an operator of arity Judgment(Judgment, Judgment).
We may declare true an operator of arity Judgment(Proposition).
We may declare prop an operator of arity Judgment(Proposition).
We may declare an operator of arity Proposition(Proposition, Proposition)
We may declare an operator of arity Proposition()
From these rules we can declare that
is a judgment. Not one we can necessarily declare, but a well-formed judgment none the less. We typically write this as;
which is our informal, surface syntax.
It's worth pointing out at this point that these sorts of things have a standard notation in the form of a BNF grammar. The above would typically be written more succinctly as;
we won't be spelling out grammars very much in this series, but, when need be, this syntax will be used. Consider it a short-hand for the kind of meaning explanation given before.
The meaning explanation for ASTs provides justification for a principle of reasoning called structural induction. Suppose that we wish to prove that some predicate holds of all ASTs of a given sort. To show this it is enough to consider all the ways in which it can be generated, and show that the predicate holds in each case under the assumption that it holds for its constituent ASTs (if any). So, in the case of the sort Exp just described, we must show that
The predicate holds for any variable of sort Exp.
The predicate holds for any number, num().
Assuming that the predicate holds for and , prove that it holds for plus(; ) and times(; ).
Because these cases exhaust all formation possibilities, we are assured that the predicate holds for any AST of sort Exp. More generally, we'd need to prove that;
for any variable of sort .
Given an operator of arity , assuming we have , ... , we must prove that .
This allows us to prove for all ASTs of sort . We will use this technique to establish the well-behaviour of substitution over ASTs.
In general, when a grammar is defined we'll be reasoning about ASTs modulo some set of variables. Each AST has a set of variables appearing within it. We say that is fresh in our AST if does not appear in it, otherwise we say that is free in our AST.
Given a sort , a substitution , where is an AST of sort , is a variable of sort , and is an AST of sort , is defined as;
when and are different variables
By well-behaviour, I mean that, given a fixed AST, a substitution will always result in a unique new AST. To be more precise, we want to prove that
Given an AST with free variables and an AST with free variables , then there is a unique AST for which .
To prove this, we consider the inductive cases for ASTs in general;
For the base case, we must consider what happens to a lone variable. That is, if we have a variable which is free in , then we have in the case that is (in which case is our unique ), and otherwise (in which case is our unique ).
For the inductive case, we must consider an operator of the form . Our inductive hypothesis is that there's a unique for each such that . In this case, we have
by the definition of substitution, and
by our inductive hypothesis. This means our unique is .
That exhausts all formation possibilities. This establishes that substitution is well-behaved over ASTs, it will never get stuck.
Abstract Binding Trees
Abstract binding trees, or ABTs, enrich ASTs with the ability to manage variables. The primary difference between an AST and an ABT is the ability to bind variables, allowing subtrees of an ABT to possess free variables which are not themselves free in the tree itself. The subtree where a bound variable is considered free is the scope of that variable. This is useful for allowing an ABT to talk about indexed expressions and expression schemes. A primary consequence of this is the arbitrary nature of any particular choice of variable within a binding.
A common example of variable binding in ordinary mathematics is in integration. In the expression , we have an integral binding the variable for use within the expression . The free variables of are , while there are no free variables in , assuming is an operator. By the nature of this binding, the choice of could have been otherwise without changing the expression. So we could rewrite the integral as . We would consider the two ABTs underlying these expressions to be identical. This identification is known as -equivalence. Binding has a specific scope. For example, in the expression , the first is a variable which isn't bound by anything, while the second is the bound of the integral. As just stated, this will be considered identical to . Furthermore, we may consider the expression to be semantically different from as the order of bound variables is different.
Just like an AST, an ABT is made up of variables and operators, each having an associated arity. In the case of ABTs, it's also possible for an operator to bind variables of a particular sort within an argument. An operator of sort with an argument of sort which binds variables of sort , , ... will have its arity denoted . Assuming that an integral binds something of sort Exp, we may denote its sort as Exp(Exp . Exp), and we may write the ABT of as . For the sake of readability, we will write as .
Since we may rename variables within a binding without changing the ABT, we define a notion of variable replacement. Given a list of variables, , we may define a replacement of these variables as a list of pairs of variables, stating new and distinct names for the previous list. We denote such a replacement by . This will be a function which, given a variable in , will return a fresh variable . We denote the application of such a replacement to an expression with .
Note that all variables within a renaming must be fresh relative to each-other to avoid binding conflicts. For example, we can't rename the variables in to since that changes the ABT, so is not a valid renaming, but is.
We may fully explain the meaning of ABTs as;
We may declare to be an ABT of sort when we've declared to be a variable of sort .
We may declare to be an ABT of sort when we've declared to be an operator of arity and, for each and all replacements for we have declared to be an ABT of sort .
That renaming is significant. When we have an operator with sub-expressions that make use of bound variables, we don't want it to be a well-formed ABT under specific binding names, but rather, we want it to be well-formed under all binding names.
A version of structural induction called structural induction modulo fresh renaming holds for ABTs. The only difference is that we must establish our predicate for an ABT modulo renaming. To prove for all ABTs, we must prove that;
holds for any variable of sort
Given an operator of arity , assuming we have, for all and all renaming of a proof of , we must prove that .
The second condition ensures that the inductive hypothesis holds for all fresh choices of bound variable names, and not just the ones actually given in the ABT.
We can now extend our grammars with rules that allow bound variables. Thus far, the most significant usages of variable binding are as part of generalized judgments and lambda expressions. Example grammar rules would look like;
We may declare an operator of arity Judgment(Proposition . Judgment).
We may declare an operator of arity Judgment(Term . Judgment).
We may declare an operator of arity Term(Term . Term).
Where Term is a sort in the grammar of type theory. Before now there was likely some confusion over exactly what general judgments were meant to bind. As it turns out, there are several judgments for each thing we may want to bind, all having the similar meaning explanations. Without a discussion of syntax, this ambiguity may have never been realized. Going forward, we will use operators which don't have the kind of over-loading that general judgments possess.
It will be useful to give a precise account of free occurrence. We can define a judgment, denoted , indicating when a variable occurs freely within an ABT. That is, when a variable appears within an ABT, but isn't bound. We give the following meaning explanation;
We may declare .
We may declare when, for some , there is an such that, for each renaming of , we may declare .
Some care is necessary to characterize substitution over ABTs. The most obvious issue is substitution of a variable who's name is already bound. Consider the case where is bound at some point within , then does not occur free within , and hence is unchanged by substitution. For example, , since there are no free occurrences of in , we can't declare . A different, perhaps more common, issue is free variable capture during substitution. For example, provided that is distinct from , is not , which confuses two different variables named . Instead, it's simply undefined. It is, however, defined on a renaming of . Capture can always be avoided by first renaming the bound variables. If we rename the bound variable to to obtain , then is defined, and reduces to .
I've stated that ABTs which are -equivalent should be considered as identical. The relation of -equivalence means that and are identical up to the choice of bound variable names. We may give -equivalence the following meaning explanation;
We may declare .
We may declare when we have declared, for every and every renaming of and renaming of , that .
When we take this identification seriously, we can define substitution unproblematically;
when and are different variables
If we cannot, for any in any , declare , then where is simply if is in and is otherwise.
The conditions in that last rule prevent the problems discussed above. The condition on prevents naive variable capture, and can always be fulfilled by replacing all bound variables in our AST with fresh symbols. The conditions on the s prevent us from substituting variables which aren't free to be substituted.
Substitution is essentially performed on -equivalence classes of ABTs rather than on an ABT itself. We can choose a capture-avoiding representative for a given class of -equivalent ABTs and, any time we perform a substitution, we first canonicalize the binding names of the ABTs. This fixes all mentioned technical problems with substitution. There are several ways to do this in practice, for example by using de Bruijin indices.
de Bruijin indices
If you're anything like me, much of the discussion of ABTs may not sit right with you. Something about considering ABTs up to swapping variables seems to be a bit of a hack, something which could easily lead to incorrect reasoning without one even taking notice. I mentioned that what should really be done is a process of canonicalization where a canonical representative of an ABT is chosen, and that de Bruijin indices could do this job. The point of this section is to describe how, exactly, this works.
A de Bruijin index is simply a numeral signifying a bound variable. is the most recently bound, the second most recent, the third most recent, etc. If we wanted to, for sake of clarity, we could mark these numerals so that we don't confuse them with actual numbers. I'll use an over-line, , , , etc.
For a de Bruijin indexed ABT, we don't explicitly bind variables. Instead, for each place we look up the arity of the operator to know when variables are bound. For example, instead of writing we'd write notice that becomes outside of the subsequent bindings, and becomes inside of them. After further bindings, the most recent variable becomes the second, third, etc. most recently bound variable.
Our notion of free variable has to be amended slightly. In the expression , is free while is bound. We define the following, where judges that is free in ;
We may declare .
We may declare when we've declared to be of arity and, for some , we've declared .
Where is the length of the list . In the real world, would be implemented on a computer as a function that returns a boolean. As an example, we may declare since we know that has arity and we may declare . We may declare that since we know that has arity , and we may declare (in this case for ) .
In the course of that validation, the variable we were testing the freedom of changed its denotation as we entered into variable bindings. This is where much of the confusion over de Bruijin indices arises.
Dealing with substitution is a rather subtle business, but once all the details are clarified, the algorithm is rather simple. Consider this naive evaluation of a substitution;
Hmm... is that right? Let's consider what this is doing when the variables are given explicit names. So should be . Well, clearly we did something wrong, our final evaluation shouldn't be . We did not take into account subsiquent bindings. When entering a binding, the variable we're substituting for goes from being the nth bound variable to the (n+1)th bound variable. Let's try again;
Now, is that right? Our final evaluation was . Well, that's not right. The second part of our pair started out being bound as but ended up being bound as . Clearly, we also need to increment the variables in what we're substituting as well whenever we enter into a binding.
So, our final evaluation is now . Wait, that's still not right. The first part of our pair had a variable bound as , but this became a after incrementing. This is because that variable wasn't free for the ABT . So we see what we must do; only modify free variables during substitution.
So our final evaluation is , which is correct. The modification that we need to perform on our substituting expression is called quotation, and is a function defined like so;
when has arity .
Using this quotation function, we can define substitution as follows;
when has arity .
This works fine, but there's an extra hitch to working with de Bruijin indices, and that's dealing with the removal of variable bindings. Take the following reduction;
Notice that the binding is being removed. That means that any subsequent binding in will no longer be the nth binding, but will now be the (n-1)th binding. This means we need to actually decrement all the free variables in by quoting with a -1.
Similarly, we have the meaning;
We may declare when we may declare for any expression .
This must be modified when using de Bruijn indices. For example;
We may declare when we may declare for any expression .
All this may seem rather complicated. A common quote goes "de Bruijn syntax is a Cylon detector". Despite that, de Bruijin indices are probably the simplest method for handling the canonicalization of ABTs, and this is common in actual computer implementations such as in Agda. That being said, we will not make mention of de Bruijin indices after this section. I think it's important to know about them, at least to have peace of mind. Using them allows one to fix all possible ambiguities in variable bindings, and after a while they do become more intuitive, but they hardly assist readability.
Comments sorted by top scores.