#1 Turing Machines
Think of a Turing machine like a hypothetical computer that works based on a strip of paper and some rules:
- The Paper (Tape): Imagine a really long strip of paper divided into little squares. Each square can hold a symbol, like a letter or number.
- The Pen (Head): There's a pen that can read what's written on the paper, write new stuff, and move left or right along the paper.
- The Brain (Finite State Control): This is like the control center. It decides what the pen should do next based on what it reads from the paper.
So, how does it work?
You start by giving the Turing machine some initial stuff written on the paper.
Then, the machine follows its rules. It looks at what's written where the pen is, and based on that, it decides what to write next, where to move, and what to do next.
It keeps doing this until it reaches a point where it's done, like when it's written "stop" on the paper.
Basically, a Turing machine is a simple way of showing how computers can follow instructions to do different tasks. It's like a blueprint for how computers can think and process information.
#2 Conditional Execution
- Conditional or branching control flow allows a program to make decisions based on certain conditions. It's like telling the computer, "If something is true, do this; otherwise, do something else."
- Looping control flow executes a chunk of code more than once. It jumps back so that you can do something again. Since you don't usually want infinite loops, it typically has some conditional logic to know when to stop looping as well.
Branching is simpler, so we’ll start there. C-derived languages have two main conditional execution features, the if statement and the perspicaciously named “conditional” operator (?:). An if statement lets you conditionally execute statements and the conditional operator lets you conditionally execute expressions.
The conditional operator is also called the “ternary” operator because it’s the only operator in C that takes three operands.
For simplicity’s sake, FarmScript doesn’t have a conditional operator, so let’s get our if statement on. Our statement grammar gets a new production.
statement → exprStmt
| ifStmt
| printStmt
| block ;
ifStmt → "if" "(" expression ")" statement
( "else" statement )? ;
An if statement has an expression for the condition, then a statement to execute if the condition is truthy. Optionally, it may also have an else keyword and a statement to execute if the condition is falsey. The syntax tree node has fields for each of those three pieces.
Like other statements, the parser recognizes an if statement by the leading if keyword.
Stmt * Parser::statement()
{
// other statements...
if (match(IF))
{
return ifStatement();
}
// other statements...
When it finds one, it calls this new method to parse the rest:
Stmt * Parser::ifStatement()
{
consume(LEFT_PAREN, "Expected '(' after 'if'.");
Expr *expr = expression();
consume(RIGHT_PAREN, "Expected ')' after expression.");
Stmt *thenStmt = statement();
Stmt *elseStmt = nullptr;
if (match(ELSE))
{
elseStmt = statement();
}
return new If(expr, thenStmt, elseStmt);
}
The parentheses around the condition are only half useful. You need some kind of delimiter between the condition and the then statement, otherwise the parser can’t tell when it has reached the end of the condition expression. But the opening parenthesis after if doesn’t do anything useful. Dennis Ritchie put it there so he could use ) as the ending delimiter without having unbalanced parentheses.
Other languages like Lua and some BASICs use a keyword like then as the ending delimiter and don’t have anything before the condition. Go and Swift instead require the statement to be a braced block. That lets them use the { at the beginning of the statement to tell when the condition is done.
As usual, the parsing code hews closely to the grammar. It detects an else clause by looking for the preceding else keyword. If there isn’t one, the elseBranch
field in the syntax tree is null.
if (first) if (second) whenTrue(); else whenFalse();
Here’s the riddle: Which if statement does that else clause belong to? This isn’t just a theoretical question about how we notate our grammar. It actually affects how the code executes:
- If we attach the else to the first if statement, then
whenFalse()
is called if first is `falsey1, regardless of what value second has. - If we attach it to the second if statement, then
whenFalse()
is only called if first is truthy and second isfalsey
.
Since else clauses are optional, and there is no explicit delimiter marking the end of the if statement, the grammar is ambiguous when you nest ifs in this way. This classic pitfall of syntax is called the dangling else problem.
Here, formatting highlights the two ways the else could be parsed. But note that since whitespace characters are ignored by the parser, this is only a guide to the human reader.
It is possible to define a context-free grammar that avoids the ambiguity directly, but it requires splitting most of the statement rules into pairs, one that allows an if with an else and one that doesn’t. It’s annoying.
Instead, most languages and parsers avoid the problem in an ad hoc way. No matter what hack they use to get themselves out of the trouble, they always choose the same interpretation—the else is bound to the nearest if that precedes it.
Our parser conveniently does that already. Since ifStatement()
eagerly looks for an else before returning, the innermost call to a nested series will claim the else clause for itself before returning to the outer if statements.
Syntax in hand, we are ready to interpret.
void Interpreter::execute(Stmt *stmt)
{
if (stmt)
{
switch (stmt->type)
{
// other cases...
case StmtType_If: visitIf((If *)stmt); break;
default: FarmScript::error(0, "Invalid statement type.");
}
}
}
void Interpreter::visitIf(If *stmt)
{
if (isTruthy(evaluate(stmt->condition)))
{
execute(stmt->thenBranch);
}
else
{
execute(stmt->elseBranch);
}
}
It evaluates the condition. If truthy, it executes the then branch. Otherwise, if there is an else branch, it executes that.
#2.1 Logical Operators
Since we don’t have the conditional operator, you might think we’re done with branching, but no. Even without the ternary operator, there are two other operators that are technically control flow constructs—the logical operators and and or.
These aren’t like other binary operators because they short-circuit. If, after evaluating the left operand, we know what the result of the logical expression must be, we don’t evaluate the right operand. For example:
false and sideEffect();
For an and expression to evaluate to something truthy, both operands must be truthy. We can see as soon as we evaluate the left false operand that that isn’t going to be the case, so there’s no need to evaluate sideEffect()
and it gets skipped.
This is why we didn’t implement the logical operators with the other binary operators. Now we’re ready. The two new operators are low in the precedence table. Similar to || and && in C, they each have their own precedence with or lower than and. We slot them right between assignment and equality.
In most programming languages, &&
typically has a higher precedence than ||
. This means that &&
is evaluated before ||
when both are used in the same expression.
For example, in the expression A && B || C
, &&
will be evaluated first, and then ||
.
Here's how the precedence works:
- && (logical AND) has higher precedence than || (logical OR).
- Parentheses ( ) have the highest precedence, and expressions within parentheses are evaluated first.
It's always a good practice to use parentheses to clarify the order of operations if there's any potential confusion. For example, (A || B) && C ensures that the || operation is evaluated first, and then the && operation.
expression → assignment ;
assignment → IDENTIFIER "=" assignment
| logic_or ;
logic_or → logic_and ( "or" logic_and )* ;
logic_and → equality ( "and" equality )* ;
Instead of falling back to equality, assignment now cascades to logic_or
. The two new rules, logic_or
and logic_and
, are similar to other binary operators. Then logic_and
calls out to equality for its operands, and we chain back to the rest of the expression rules.
Expr Parser::assignment() {
Expr expr = or();
if (match(EQUAL)) {
// Handle assignment
The code to parse a series of or
expressions mirrors other binary operators:
private Expr or() {
Expr expr = and();
while (match(OR)) {
Token operator = previous();
Expr right = and();
expr = new Expr.Logical(expr, operator, right);
}
return expr;
}
Its operands are the next higher level of precedence, the new and
expression:
Expr Parser::logicalAnd() {
Expr expr = equality();
while (match(AND)) {
Token op = previous();
Expr right = equality();
expr = Expr(new Logical(expr, op, right));
}
return expr;
}
That calls equality()
for its operands, and with that, the expression parser is all tied back together again. We are ready to interpret.
Object Interpreter::visitLogical(Logical *expr)
{
Object left = evaluate(expr->left);
if (expr->oper->type == AND)
{
if (isTruthy(left))
{
return evaluate(expr->right);
}
}
else
{
if (!isTruthy(left))
{
return evaluate(expr->right);
}
}
return left;
}
If you compare this to the earlier chapter’s visitBinaryExpr()
method, you can see the difference. Here, we evaluate the left operand first. We look at its value to see if we can short-circuit. If not, and only then, do we evaluate the right operand.
The other interesting piece here is deciding what actual value to return. Since Lox is dynamically typed, we allow operands of any type and use truthiness to determine what each operand represents. We apply similar reasoning to the result. Instead of promising to literally return true or false, a logic operator merely guarantees it will return a value with appropriate truthiness.
Fortunately, we have values with proper truthiness right at hand—the results of the operands themselves. So we use those. For example:
print "hi" or 2; // "hi".
print nil or "yes"; // "yes".
On the first line, "hi" is truthy, so the or short-circuits and returns that. On the second line, nil is falsey
, so it evaluates and returns the second operand, "yes".
#2.2 While Loops
FarmScript features two looping control flow statements, while and for. The while loop is the simpler one, so we’ll start there. Its grammar is the same as in C.
statement → exprStmt
| ifStmt
| printStmt
| whileStmt
| block ;
whileStmt → "while" "(" expression ")" statement ;
We add another clause to the statement rule that points to the new rule for while. It takes a while keyword, followed by a parenthesized condition expression, then a statement for the body. That new grammar rule gets a syntax tree node.
The node stores the condition and body. Here you can see why it’s nice to have separate base classes for expressions and statements. The field declarations make it clear that the condition is an expression and the body is a statement.
Over in the parser, we follow the same process we used for if statements. First, we add another case in statement() to detect and match the leading keyword.
Stmt * Parser::statement()
{
// other if statements...
else if (match(IF))
{
return ifStatement();
}
else if (match(WHILE))
{
return whileStatement();
}
That delegates the real work to this method:
Stmt Parser::whileStatement() {
consume(LEFT_PAREN, "Expect '(' after 'while'.");
Expr condition = expression();
consume(RIGHT_PAREN, "Expect ')' after condition.");
Stmt body = statement();
return Stmt(new While(condition, body));
}
here’s how we execute the new syntax:
void Interpreter::visitWhile(While *stmt)
{
while (isTruthy(evaluate(stmt->condition)))
{
execute(stmt->body);
}
}
#2.3 For Loops
We are down to the last flow construct, for
loop.
for (var i = 0; i < 10; i = i + 1) print i;
In grammarese, that's:
statement → exprStmt
| forStmt
| ifStmt
| printStmt
| whileStmt
| block ;
forStmt → "for" "(" ( varDecl | exprStmt | ";" )
expression? ";"
expression? ")" statement ;
Most modern languages have a higher-level looping statement for iterating over arbitrary user-defined sequences. C# has foreach, Java has “enhanced for”, even C++ has range-based for statements now. Those offer cleaner syntax than C’s for statement by implicitly calling into an iteration protocol that the object being looped over supports.
I love those. For FarmScript, though, we’re limited by building up the interpreter a chapter at a time. We don’t have objects and methods yet, so we have no way of defining an iteration protocol that the for loop could use. So we’ll stick with the old school C for loop. Think of it as “vintage”. The fixie of control flow statements.
Inside the parentheses, you have three clauses separated by semicolons:
- The first clause is the initializer. It is executed exactly once, before anything else. It’s usually an expression, but for convenience, we also allow a variable declaration. In that case, the variable is scoped to the rest of the for loop—the other two clauses and the body.
- Next is the condition. As in a while loop, this expression controls when to exit the loop. It’s evaluated once at the beginning of each iteration, including the first. If the result is truthy, it executes the loop body. Otherwise, it bails.
- The last clause is the increment. It’s an arbitrary expression that does some work at the end of each loop iteration. The result of the expression is discarded, so it must have a side effect to be useful. In practice, it usually increments a variable.
Any of these clauses can be omitted. Following the closing parenthesis is a statement for the body, which is typically a block.
#2.3.1 Desugaring
That’s a lot of machinery, but note that none of it does anything you couldn’t do with the statements we already have. If for loops didn’t support initializer clauses, you could just put the initializer expression before the for statement. Without an increment clause, you could simply put the increment expression at the end of the body yourself.
In other words, FarmScript doesn’t need for loops, they just make some common code patterns more pleasant to write. These kinds of features are called syntactic sugar. For example, the previous for loop could be rewritten like so:
{
var i = 0;
while (i < 10) {
print i;
i = i + 1;
}
}
This script has the exact same semantics as the previous one, though it’s not as easy on the eyes. Syntactic sugar features like FarmScript's for loop make a language more pleasant and productive to work in. But, especially in sophisticated language implementations, every language feature that requires back-end support and optimization is expensive.
We can have our cake and eat it too by desugaring. That funny word describes a process where the front end takes code using syntax sugar and translates it to a more primitive form that the back end already knows how to execute.
We’re going to desugar for loops to the while loops and other statements the interpreter already handles. In our simple interpreter, desugaring really doesn’t save us much work, but it does give me an excuse to introduce you to the technique. So, unlike the previous statements, we won’t add a new syntax tree node. Instead, we go straight to parsing. First, add an import we’ll need soon.
Like every statement, we start parsing a for loop by matching its keyword.
Stmt * Parser::statement()
{
// other if statements...
else if (match(WHILE))
{
return whileStatement();
}
else if (match(FOR))
{
return forStatement();
}
Here is where it gets interesting. The desugaring is going to happen here, so we’ll build this method a piece at a time, starting with the opening parenthesis before the clauses.
Stmt Parser::forStatement() {
consume(LEFT_PAREN, "Expect '(' after 'for'.");
// Initialization Clause
Stmt *initializer;
if (match(SEMICOLON))
{
initializer = nullptr;
}
else if (match(VAR))
{
initializer = varDeclaration();
}
else
{
initializer = expressionStatement();
}
If the token following the ( is a semicolon then the initializer has been omitted. Otherwise, we check for a var keyword to see if it’s a variable declaration. If neither of those matched, it must be an expression. We parse that and wrap it in an expression statement so that the initializer is always of type Stmt.
Next Up is the condition:
// Parse condition clause
Expr condition = null;
if (!check(SEMICOLON)) {
condition = expression();
}
consume(SEMICOLON, "Expect ';' after loop condition.");
Again, we look for a semicolon to see if the clause has been omitted. The last clause is the increment.
Increment Clause:
// Parse increment clause
Expr increment = null;
if (!check(RIGHT_PAREN)) {
increment = expression();
}
consume(RIGHT_PAREN, "Expect ')' after for clauses.");
It is similar to the condition clause except this one is terminated by the closing parenthesis. All that remains is the body.
// body
Stmt body = statement();
// if there is increment
if (increment)
{
ListStmt *stmts = new ListStmt();
stmts->list.push_back(body);
stmts->list.push_back(new Expression(increment));
body = new Block(stmts);
}
if (!condition)
{
condition = new Literal(new Object(true));
}
body = new While(condition, body);
if (initializer)
{
ListStmt *stmts = new ListStmt();
stmts->list.push_back(initializer);
stmts->list.push_back(body);
body = new Block(stmts);
}
return body;
The increment, if there is one, executes after the body in each iteration of the loop. We do that by replacing the body with a little block that contains the original body followed by an expression statement that evaluates the increment.
Next, we take the condition and the body and build the loop using a primitive while loop. If the condition is omitted, we jam in true to make an infinite loop.
Finally, if there is an initializer, it runs once before the entire loop. We do that by, again, replacing the whole statement with a block that runs the initializer and then executes the loop.
That’s it. Our interpreter now supports C-style for loops and we didn’t have to touch the Interpreter class at all. Since we desugared to nodes the interpreter already knows how to visit, there is no more work to do.
Finally, FarmScript is powerful enough to entertain us, at least for a few minutes. Here’s a tiny program to print the first 21 elements in the Fibonacci sequence:
var a = 0;
var temp;
for (var b = 1; a < 10000; b = temp + b) {
print a;
temp = a;
a = b;
}