Statements and State

To enable variable bindings, our interpreter requires internal memory. When a variable is defined early in the program and referenced later, the interpreter must retain the value of that variable until it's needed. Therefore, in this chapter, we will enhance our interpreter with memory capabilities, allowing it not only to process but also to retain information.

State and statements are closely related concepts in programming languages. Unlike expressions, which evaluate to a value, statements perform actions or operations that produce side effects. These side effects can include producing visible output for the user or modifying the state of the interpreter.

In the context of an interpreter, state refers to the current configuration or condition of the program being executed. This includes variable bindings, function definitions, control flow information, and other data structures that affect the behavior of the program.

Statements are essential for defining and manipulating this state. For example, variable declaration statements introduce new variables into the program's memory space, while assignment statements modify the values of existing variables. Other types of statements, such as control flow statements (if-else, loops) and function declarations, also interact with the program's state.

The ability of statements to modify state makes them particularly useful for defining and manipulating named entities within a program. For instance, variable declaration statements establish bindings between variable names and values, allowing these variables to be accessed and modified throughout the program's execution.

In this chapter, we’ll do all of that. We’ll define statements that produce output (print) and create state (var). We’ll add expressions to access and assign to variables. Finally, we’ll add blocks and local scope. That’s a lot to stuff into one chapter, but we’ll chew through it all one bite at a time.

#1 Statements


To extend Lox's grammar with statements, we introduce two simple kinds:

  1. Expression Statement:
    1. An expression statement allows placing an expression where a statement is expected.
    2. It serves to evaluate expressions that have side effects.
    3. Expression statements are commonly used in languages like C and Java. Whenever you encounter a function or method call followed by a semicolon (;), it's an expression statement.
  2. Print Statement:
    1. A print statement evaluates an expression and displays the result to the user.
    2. It's unusual to embed printing directly into the language instead of making it a library function. However, in the context of building the interpreter incrementally, it allows us to experiment with the language features before completing the entire implementation.
    3. Making print a library function would require having all the machinery for defining and calling functions, delaying our ability to observe side effects.

These additions to the grammar enable us to incorporate statements into FarmScript, providing more flexibility and functionality in the language. With expression statements, we can perform actions based on expressions with side effects, while print statements facilitate outputting values for user interaction.

In this chapter, we introduce new grammar rules to parse entire Lox scripts. Since Lox is an imperative, dynamically typed language, the top level of a script consists of a list of statements. The new rules are as follows:

program        → statement* EOF ;

statement      → exprStmt
               | printStmt ;

exprStmt       → expression ";" ;
printStmt      → "print" expression ";" ;

Let's break down these rules:

  1. program:
    1. The program rule represents a complete Lox script or a REPL entry.
    2. It consists of zero or more statements followed by the special "end of file" token (EOF).
    3. This rule ensures that the parser consumes the entire input and doesn't silently ignore any unconsumed tokens at the end of a script.
  2. statement:
    1. The statement rule defines the different types of statements allowed in a Lox script.
    2. Currently, it only includes two cases for the two kinds of statements described earlier: expression statements (exprStmt) and print statements (printStmt).
    3. We will add more types of statements later in this chapter and in subsequent ones as we expand the language's features.
  3. exprStmt:
    1. The exprStmt rule specifies the syntax for an expression statement, where an expression is followed by a semicolon (;).
    2. This rule allows us to evaluate expressions with side effects as standalone statements.
  4. printStmt:
    1. The printStmt rule defines the syntax for a print statement, which consists of the keyword "print" followed by an expression and then a semicolon (;).
    2. Print statements are used to display the result of an expression to the user.

These grammar rules enable us to parse Lox scripts and build syntax trees that represent their structure. Syntax trees provide a structured representation of the script's statements, making it easier to analyze and manipulate the program during interpretation or compilation.

#1.1 Statement syntax trees

In the grammar of languages like Lox, expressions and statements have their own distinct contexts. For example, the operands of operators like + are expressions, while the body of control flow structures like while loops are composed of statements.

#include <memory>

// Forward declaration of Expr class
class Expr;

// Base class for statements
class Stmt {
public:
    virtual ~Stmt() = default;
};

// Expression statement class
class ExpressionStmt : public Stmt {
public:
    ExpressionStmt(std::unique_ptr<Expr> expression) : expression(std::move(expression)) {}
    std::unique_ptr<Expr> expression;
};

// Print statement class
class PrintStmt : public Stmt {
public:
    PrintStmt(std::unique_ptr<Expr> expression) : expression(std::move(expression)) {}
    std::unique_ptr<Expr> expression;
};

In this C++ code:

  • We define a Stmt base class for statements.
  • We create subclasses ExpressionStmt and PrintStmt for expression and print statements, respectively.
  • Each subclass holds a pointer to an Expr object as its payload.

Explanation:

  • Stmt: This serves as the base class for all types of statements in the language.
  • ExpressionStmt: Represents an expression statement, where an expression is evaluated but its result is not used in any significant way. It contains a pointer to the expression being evaluated.
  • PrintStmt: Represents a print statement, where an expression is evaluated and its result is printed to the output. It also contains a pointer to the expression being printed.

This design follows the principle of separating expressions and statements into distinct class hierarchies.

#1.2 Parsing statements

The parser's parse() method that parses and returns a single expression was a temporary hack to get the last chapter up and running. Now that our grammar has the correct starting rule, program, we can parse() into the real deal.

#include <vector>
#include <memory>

// Forward declaration of Stmt class
class Stmt;

// Parser class
class Parser {
public:
    std::vector<std::unique_ptr<Stmt>> parse() {
        std::vector<std::unique_ptr<Stmt>> statements;
        while (!isAtEnd()) {
            statements.push_back(statement());
        }
        return statements;
    }

private:
    // Other parser methods and members...
};

Explanation:

  • std::vector<std::unique_ptr<Stmt>> parse(): This method parses the input source code and returns a vector of unique pointers to Stmt objects. Each pointer represents a parsed statement in the program.
  • std::vector<std::unique_ptr<Stmt>> statements;: This vector holds the parsed statements. It is initially empty and populated with statements during parsing.
  • while (!isAtEnd()) { statements.push_back(statement()); }: This loop continues until the end of input is reached. During each iteration, it calls the statement() method to parse a single statement and adds it to the statements vector.
  • return statements;: After parsing all the statements, the method returns the vector containing the parsed statements.

It iterates over the input source code, parses each statement, and stores the parsed statements in a vector, which is then returned. 

#include <vector>
#include <memory>
#include <string>

// Forward declaration of Stmt class
class Stmt;

// Parser class
class Parser {
public:
    std::vector<std::unique_ptr<Stmt>> parse() {
        std::vector<std::unique_ptr<Stmt>> statements;
        while (!isAtEnd()) {
            statements.push_back(statement());
        }
        return statements;
    }

private:
    // Other parser methods and members...

    Stmt* statement() {
        if (match("PRINT")) return printStatement();
        return expressionStatement();
    }

    Stmt* printStatement() {
        // Implementation for parsing print statement
        return nullptr;
    }

    Stmt* expressionStatement() {
        // Implementation for parsing expression statement
        return nullptr;
    }

    bool match(const std::string& token) {
        // Implementation for matching token
        return false;
    }

    bool isAtEnd() const {
        // Implementation for checking end of input
        return true;
    }
};

Explanation:

  • Stmt* statement(): This method determines the type of statement based on the current token in the input source code. If the current token matches the PRINT keyword, it calls printStatement() to parse a print statement. Otherwise, it calls expressionStatement() to parse an expression statement.
  • Stmt* printStatement(): This method represents the parsing logic for print statements. You would implement this method to handle the parsing of print statements, including parsing the expression to be printed.
  • Stmt* expressionStatement(): This method represents the parsing logic for expression statements. You would implement this method to handle the parsing of expression statements, including parsing the expression itself.
  • bool match(const std::string& token): This method checks if the current token in the input source code matches a given token. You would implement this method to compare the current token with the provided token.
  • bool isAtEnd() const: This method checks if the parser has reached the end of the input source code. You would implement this method to determine if there are any more tokens to parse.
Stmt* printStatement() {
        std::unique_ptr<Expr> value = expression();
        consume("SEMICOLON", "Expect ';' after value.");
        return new PrintStmt(std::move(value));
    }
  • Stmt* printStatement(): This method represents the parsing logic for print statements. It parses the expression to be printed, consumes the semicolon token, and then constructs and returns a PrintStmt object with the parsed expression.
  • std::unique_ptr<Expr> value = expression();: This line parses the expression to be printed and stores it in a unique pointer.
  • consume("SEMICOLON", "Expect ';' after value.");: This line consumes the semicolon token following the expression. You would implement this method to advance the parser to the next token and check if it matches the expected semicolon token.
  • return new PrintStmt(std::move(value));: This line constructs and returns a PrintStmt object using the parsed expression. We use std::move() to transfer ownership of the expression pointer to the PrintStmt constructor.
Stmt* expressionStatement() {
        std::unique_ptr<Expr> expr = expression();
        consume("SEMICOLON", "Expect ';' after expression.");
        return new ExpressionStmt(std::move(expr));
    }
  • Stmt* expressionStatement(): This method represents the parsing logic for expression statements. It parses the expression, consumes the semicolon token, and then constructs and returns an ExpressionStmt object with the parsed expression.
  • std::unique_ptr<Expr> expr = expression();: This line parses the expression and stores it in a unique pointer.
  • consume("SEMICOLON", "Expect ';' after expression.");: This line consumes the semicolon token following the expression. You would implement this method to advance the parser to the next token and check if it matches the expected semicolon token.
  • return new ExpressionStmt(std::move(expr));: This line constructs and returns an ExpressionStmt object using the parsed expression. We use std::move() to transfer ownership of the expression pointer to the ExpressionStmt constructor.

#1.3 Executing statements

#2 Gobal Variables

#2.1 Variable syntax

#2.2 Parsing variables

#3 Environments

An environment is a fundamental data structure used in programming language implementations to associate variables with their corresponding values. It serves as a mapping between variable names and the values they represent.

var a = 1;
var b = 2;

ENVIRONMENT
------------
|  a -> 1  |
------------
|  b -> 2  |
-----------

You can think of it like a map where the keys are variable names and the values are the variable’s, values.

Java calls them maps or hashmaps. Other languages call them hash tables, dictionaries (Python and C#), hashes (Ruby and Perl), tables (Lua), or associative arrays (PHP). Way back when, they were known as scatter tables.

#include <unordered_map>
#include <string>

class Environment {
private:
    std::unordered_map<std::string, Object> values;
};
  • std::unordered_map is used to implement the mapping between variable names (strings) and their corresponding values.
  • std::string is used to represent variable names.

It uses bare strings for the keys, not tokens. A token represents a unit of code at a specific place in the source text, but when it comes to looking up variables, all identifier tokens with the same name should refer to the same variable (ignoring scope for now). Using the raw string ensures all of those tokens refer to the same map key.

There are two operations we need to support. First, a variable definition binds a new name to a value.

void define(std::string name, Object value)
    {
        // TODO, Check redefinition!
        values[name] = value;
    }

Not exactly brain surgery, but we have made one interesting semantic choice. When we add the key to the map, we don’t check to see if it’s already present. That means that this program works:

var abc = "before value";
print abc; // "before value"

var abc = "after value";
print abc; // "after value"

A variable statement doesn’t just define a new variable, it can also be used to redefine an existing variable. We could choose to make this an error instead. The user may not intend to redefine an existing variable. (If they did mean to, they probably would have used assignment, not var.) Making redefinition an error would help them find that bug.

However, doing so interacts poorly with the REPL. In the middle of a REPL session, it’s nice to not have to mentally track which variables you’ve already defined. We could allow redefinition in the REPL but not in scripts, but then users would have to learn two sets of rules, and code copied and pasted from one form to the other might not work.

So, to keep the two modes consistent, we’ll allow it—at least for global variables. Once a variable exists, we need a way to look it up.

Object Environment::get(Token name) {
    if (values.find(name.lexeme) != values.end()) {
        return values[name.lexeme];
    }

    throw RuntimeError(name,
        "Undefined variable '" + name.lexeme + "'.");
}

This is a little more semantically interesting. If the variable is found, it simply returns the value bound to it. But what if it’s not? Again, we have a choice:

  • Make it a syntax error.
  • Make it a runtime error.
  • Allow it and return some default value like nil.

FarmScript is pretty lax, but the last option is a little too permissive to me. Making it a syntax error—a compile-time error—seems like a smart choice. Using an undefined variable is a bug, and the sooner you detect the mistake, the better.

To handle situations where variables are referenced before they are defined, especially in the context of mutually recursive functions, we can defer the error to runtime rather than making it a static error. This means that referring to a variable before it's defined is allowed as long as the reference is not evaluated immediately.

In older languages like C and Pascal, variables and functions must be declared before they are used. This requirement stems from the way compilers for these languages were designed, which was influenced by limitations in computing power and compiler technology at the time.

The main reason for requiring explicit forward declarations is that compilers for languages like C and Pascal typically operate in a single-pass mode, meaning they process the source code from top to bottom in one go. In this mode, the compiler can only see code that appears earlier in the file when processing a particular piece of code.

By requiring forward declarations, the compiler ensures that it knows about all variables and functions before they are used. This allows the compiler to resolve references to variables and functions correctly during parsing and code generation.

Additionally, forward declarations help in managing the complexity of the compilation process. They provide a clear indication of the symbols (variables and functions) used in the program and their types before the actual definitions are encountered. This makes it easier for the compiler to perform type checking and generate efficient code.

While requiring forward declarations may seem cumbersome compared to modern languages that allow variables and functions to be used before they are declared, it was a necessary compromise to accommodate the limitations of early compilers and computer systems. As technology has advanced, modern compilers and programming languages have been able to relax these restrictions without sacrificing performance or efficiency.

It’s OK to refer to a variable before it’s defined as long as you don’t evaluate the reference. That lets the program for even and odd numbers work, but you’d get a runtime error in:

print abc;
var abc = "I was slow";

As with type errors in the expression evaluation code, we report a runtime error by throwing an exception. The exception contains the variable’s token so we can tell the user where in their code they messed up.

#3.1 Interpreting global variables

The Interpreter class gets an instance of the new Environment class.

class Interpreter {
private:
    Environment environment;

public:
    void interpret(std::vector<Stmt*> statements) {

We store it as a field directly in Interpreter so that the variables stays in memory as long as the interpreter is still running.

We have two new syntax trees, so that’s two new visit methods. The first is for declaration statements.

Void Interpreter::visitVarStmt(Var* stmt) {
    Object value = nullptr;
    if (stmt->initializer != nullptr) {
        value = evaluate(stmt->initializer);
    }

    environment.define(stmt->name.lexeme, value);
    return nullptr;
}

If the variable has an initializer, we evaluate it. If not, we have another choice to make. We could have made this a syntax error in the parser by requiring an initializer. Most languages don’t, though, so it feels a little harsh to do so in FarmScript.

We could make it a runtime error. We’d let you define an uninitialized variable, but if you accessed it before assigning to it, a runtime error would occur. It’s not a bad idea, but most dynamically typed languages don’t do that. Instead, we’ll keep it simple and say that FarmScript sets a variable to nil if it isn’t explicitly initialized.

var abc;
print abc; // "nil"

Thus, if there isn’t an initializer, we set the value to null, which is the C++ representation of FarmScript's nil value. Then we tell the environment to bind the variable to that value.

Next, we evaluate a variable expression.

Object Interpreter::visitVariable(Expr::Variable* expr) {
    return environment.get(expr->name);
}

This simply forwards to the environment which does the heavy lifting to make sure the variable is defined. With that, we’ve got rudimentary variables working. Try this out:

var a = 1;
var b = 2;
print a + b;

#4 Assignment

It’s possible to create a language that has variables but does not let you reassign—or mutate—them. Haskell is one example. SML supports only mutable references and arrays—variables cannot be reassigned. Rust steers you away from mutation by requiring a mut modifier to enable assignment.

FarmScript is an imperative language, and mutation comes with the territory. Adding support for assignment doesn’t require much work. Global variables already support redefinition, so most of the machinery is there now. Mainly, we’re missing an explicit assignment notation.

#4.1 Assignment syntax

That little = syntax is more complex than it might seem. Like most C-derived languages, assignment is an expression and not a statement. As in C, it is the lowest precedence expression form. That means the rule slots between expression and equality (the next lowest precedence expression).

In C-derived languages, including C++, assignment is treated as an expression rather than a statement. This means that assignment operations can be used within larger expressions and can return a value.

int a = 10;
int b = (a = 20); // Assigns 20 to 'a' and returns 20, assigning it to 'b'

In the above code, the assignment (a = 20) is an expression that assigns the value 20 to the variable 'a' and returns 20 as its value. This returned value (20) is then assigned to the variable 'b'.

The precedence of the assignment operator (=) is lower than that of most other operators, such as arithmetic and relational operators. This means that assignment expressions are evaluated after other expressions with higher precedence.

n the other hand, in languages like Pascal, Python, and Go, assignment is treated as a statement rather than an expression. This means that assignment operations cannot be used within larger expressions and cannot return a value.

For example, in Python:

a = 10
b = (a = 20)  # This will result in a syntax error
expression     → assignment ;
assignment     → IDENTIFIER "=" assignment
               | equality ;

This says an assignment is either an identifier followed by an = and an expression for the value, or an equality (and thus any other) expression. Later, assignment will get more complex when we add property setters on objects, like:

Expr * Parser::expression()
{
    Expr *expr = assignment();
    return expr;
}

Here is where it gets tricky. A single token lookahead recursive descent parser can’t see far enough to tell that it’s parsing an assignment until after it has gone through the left-hand side and stumbled onto the =. You might wonder why it even needs to. After all, we don’t know we’re parsing a + expression until after we’ve finished parsing the left operand.

The difference is that the left-hand side of an assignment isn’t an expression that evaluates to a value. It’s a sort of pseudo-expression that evaluates to a “thing” you can assign to. Consider:

var a = "before";
a = "value";

On the second line, we don’t evaluate a (which would return the string “before”). We figure out what variable a refers to so we know where to store the right-hand side expression’s value. The classic terms for these two constructs are l-value and r-value. All of the expressions that we’ve seen so far that produce values are r-values. An l-value “evaluates” to a storage location that you can assign into.

In fact, the names come from assignment expressions: l-values appear on the left side of the = in an assignment, and r-values on the right.

We want the syntax tree to reflect that an l-value isn’t evaluated like a normal expression. That’s why the Expr.Assign node has a Token for the left-hand side, not an Expr. The problem is that the parser doesn’t know it’s parsing an l-value until it hits the =. In a complex l-value, that may occur many tokens later.

We have only a single token of lookahead, so what do we do? We use a little trick, and it looks like this:

Expr* Parser::assignment() {
    Expr* expr = equality();

    if (match(EQUAL)) {
        Token equals = previous();
        Expr* value = assignment();

        if (dynamic_cast<Expr::Variable*>(expr) != nullptr) {
            Token* name = dynamic_cast<Expr::Variable*>(expr)->name;
            return new Expr::Assign(name, value);
        }

        error(equals, "Invalid assignment target.");
    }

    return expr;
}

Most of the code for parsing an assignment expression looks similar to that of the other binary operators like +. We parse the left-hand side, which can be any expression of higher precedence. If we find an =, we parse the right-hand side and then wrap it all up in an assignment expression tree node.

One slight difference from binary operators is that we don’t loop to build up a sequence of the same operator. Since assignment is right-associative, we instead recursively call assignment() to parse the right-hand side.

#4.2 Assignment semantics

We have a new syntax tree node, so our interpreter gets a new visit node:

Object Interpreter::visitAssign(Expr::Assign* expr) {
    Object value = evaluate(expr->value);
    environment.assign(expr->name, value);
    return value;
}

For obvious reasons, it’s similar to variable declaration. It evaluates the right-hand side to get the value, then stores it in the named variable. Instead of using define() on Environment, it calls this new method:

void Environment::assign(Token name, Object value) {
    if (values.find(name.lexeme) != values.end()) {
        values[name.lexeme] = value;
        return;
    }

    throw RuntimeError(name,
        "Undefined variable '" + name.lexeme + "'.");
}

The key difference between assignment and definition is that assignment is not allowed to create a new variable. In terms of our implementation, that means it’s a runtime error if the key doesn’t already exist in the environment’s variable map.

The last thing the  visitAssign() method does is return the assigned value. That's because assignment is an expression that can be nested inside other expressions, like so:

var a = 1;
print a = 2; // "2".

Note: In C++

#include<iostream>

using namespace std;

int main(){
    int a = 10;
    int b = (a = 20);

    cout<< (a = 7) <<":"<<b<<endl;
}

// output
7:20

Our interpreter can now create, read, and modify variables. It’s about as sophisticated as early BASICs. Global variables are simple, but writing a large program when any two chunks of code can accidentally step on each other’s state is no fun. We want local variables, which means it’s time for scope.

#5 Scope

A scope defines a region where a name maps to a certain entity. Multiple scopes enable the same name to refer to different things in different contexts. In my house, “Bob” usually refers to me. But maybe in your town you know a different Bob. Same name, but different dudes based on where you say it.

Lexical scope (or the less commonly heard static scope) is a specific style of scoping where the text of the program itself shows where a scope begins and ends. In Lox, as in most modern languages, variables are lexically scoped. When you see an expression that uses some variable, you can figure out which variable declaration it refers to just by statically reading the code.

“Lexical” comes from the Greek “lexikos” which means “related to words”. When we use it in programming languages, it usually means a thing you can figure out from source code itself without having to execute anything.

Lexical scope came onto the scene with ALGOL. Earlier languages were often dynamically scoped. Computer scientists back then believed dynamic scope was faster to execute. Today, thanks to early Scheme hackers, we know that isn’t true. If anything, it’s the opposite.

Dynamic scope for variables lives on in some corners. Emacs Lisp defaults to dynamic scope for variables. The binding macro in Clojure provides it. The widely disliked with statement in JavaScript turns properties on an object into dynamically scoped variables.

{
  var a = "first";
  print a; // "first".
}

{
  var a = "second";
  print a; // "second".
}

Here, we have two blocks with a variable a declared in each of them. You and I can tell just from looking at the code that the use of a in the first print statement refers to the first a, and the second one refers to the second.


First Block             Second Block
---------------        ----------------
| a -> "first" |       | a -> "second" |
---------------        ----------------

This is in contrast to dynamic scope where you don’t know what a name refers to until you execute the code. FarmScript doesn’t have dynamically scoped variables, but methods and fields on objects are dynamically scoped.

class Saxophone {
  play() {
    print "Careless Whisper";
  }
}

class GolfClub {
  play() {
    print "Fore!";
  }
}

fun playIt(thing) {
  thing.play();
}

When playIt() calls thing.play(), we don’t know if we’re about to hear “Careless Whisper” or “Fore!” It depends on whether you pass a Saxophone or a GolfClub to the function, and we don’t know that until runtime.

Scope and environments are close cousins. The former is the theoretical concept, and the latter is the machinery that implements it. As our interpreter works its way through code, syntax tree nodes that affect scope will change the environment. In a C-ish syntax like Lox’s, scope is controlled by curly-braced blocks. (That’s why we call it block scope.)

{
  var a = "in block";
}
print a; // Error! No more "a".

The beginning of a block introduces a new local scope, and that scope ends when execution passes the closing }. Any variables declared inside the block disappear.

#5.1 Nesting and shadowing

A first cut at implementing block scope might work like this:

  • As we visit each statement inside the block, keep track of any variables declared.
  • After the last statement is executed, tell the environment to delete all of those variables.

That would work for the previous example. But remember, one motivation for local scope is encapsulation—a block of code in one corner of the program shouldn’t interfere with some other block. Check this out:

// How loud?
var volume = 11;

// Silence.
volume = 0;

// Calculate size of 3x4x5 cuboid.
{
  var volume = 3 * 4 * 5;
  print volume;
}

Look at the block where we calculate the volume of the cuboid using a local declaration of volume. After the block exits, the interpreter will delete the global volume variable. That ain’t right. When we exit the block, we should remove any variables declared inside the block, but if there is a variable with the same name declared outside of the block, that’s a different variable. It shouldn’t get touched.

When a local variable has the same name as a variable in an enclosing scope, it shadows the outer one. Code inside the block can’t see it any more—it is hidden in the “shadow” cast by the inner one—but it’s still there.

When we enter a new block scope, we need to preserve variables defined in outer scopes so they are still around when we exit the inner block. We do that by defining a fresh environment for each block containing only the variables defined in that scope. When we exit the block, we discard its environment and restore the previous one.

We also need to handle enclosing variables that are not shadowed.

var global = "outside";
{
  var local = "inside";
  print global + local;
}

Here, global lives in the outer global environment and local is defined inside the block’s environment. In that print statement, both of those variables are in scope. In order to find them, the interpreter must search not only the current innermost environment, but also any enclosing ones.

We implement this by chaining the environments together. Each environment has a reference to the environment of the immediately enclosing scope. When we look up a variable, we walk that chain from innermost out until we find the variable. Starting at the inner scope is how we make local variables shadow outer ones.

Before we add block syntax to the grammar, we’ll beef up our Environment class with support for this nesting. First, we give each environment a reference to its enclosing one.

class Environment
{
private:
    std::unordered_map<std::string, Object> values;
    Environment *enclosing;

This field needs to be initialized, so we add a couple of constructors.

	Environment()
    {
        enclosing = nullptr;
    }

    Environment(Environment *enclosing)
    {
        this->enclosing = enclosing;
    }

The no-argument constructor is for the global scope’s environment, which ends the chain. The other constructor creates a new local scope nested inside the given outer one.

We don’t have to touch the define() method—a new variable is always declared in the current innermost scope. But variable lookup and assignment work with existing variables and they need to walk the chain to find them. First, lookup:

 void assign(Token *name, Object value)
    {
        auto it = values.find(name->lexeme);

        if (it != values.end())
        {
            it->second = value;
        }
        else if (enclosing)
        {
            enclosing->assign(name, value);
        }
        else
        {
            FarmScript::runtimeError(*name, "Undefined variable '" + name->lexeme + "'");
        }
    }

If the variable isn’t found in this environment, we simply try the enclosing one. That in turn does the same thing recursively, so this will ultimately walk the entire chain. If we reach an environment with no enclosing one and still don’t find the variable, then we give up and report an error as before.

 

#5.2 Block syntax and semantics