Representing the Code

# Syntax Tree Representation

When interpreting or compiling code, it's helpful to represent the structure of the code in memory. We use syntax trees for this purpose. Just like a family tree, a syntax tree shows how different parts of the code relate to each other.

#1 Parsing

After tokenization, the next step in the compilation or interpretation process typically involves parsing. Parsing is the process of analyzing the stream of tokens produced during tokenization according to the rules of the language's grammar. This involves constructing a parse tree or an abstract syntax tree (AST) that represents the syntactic structure of the program.

Parsing:

  • In parsing, the sequence of tokens is analyzed to ensure it conforms to the grammar of the programming language.
  • The parser verifies whether the arrangement of tokens follows the syntactic rules of the language. It often uses a grammar specification, such as a context-free grammar (CFG), to guide this process.
  • If the sequence of tokens is valid according to the grammar, the parser constructs a parse tree or AST that represents the structure of the program.

#2 Let's start Parsing

	// Tokenization
	Scanner scanner(source);
    std::vector<Token> tokens = scanner.scanTokens();
    
    // Parsing
	Parser parser(tokens);
    std::vector<Stmt *> statements = parser.parse();

#2.1 Constructor

Instantiate Parser Object

Parser::Parser(std::vector<Token> tokens)
{
    this->tokens = tokens;
}

#2.2 Parse Tokens

std::vector<Stmt *> statements = parser.parse();
  • The parse method of the parser object is called to parse the sequence of tokens.
  • The result of parsing operation is stored in a vector named statements.
std::vector<Stmt *> Parser::parse() {
    std::vector<Stmt *> result; // Initialize an empty vector to store parsed statements

    // Continue parsing until the end of the token sequence is reached
    while (!isAtEnd()) {
        Stmt *stmt = declaration(); // Parse a single declaration statement

        // Check if an error occurred during parsing
        if (FarmScript::hadError) {
            synchronize(); // Attempt to synchronize parsing by discarding tokens until a valid synchronization point is found

            // Clean up memory and break out of the loop
            if (stmt) {
                delete stmt; // Delete the partially parsed statement
            }
            stmt = nullptr; // Reset the pointer
            break; // Exit the loop
        } else {
            result.push_back(stmt); // Add the parsed statement to the result vector
        }
    }

    return result; // Return the vector of parsed statements
}

Let's break this code:

#2.2.1 Initialize result vector

    std::vector<Stmt *> result; // Initialize an empty vector to store parsed statements
  • This initializes an empty vector result to store parsed statements.

#2.2.2 Loop until end of token sequence

// Continue parsing until the end of the token sequence is reached
    while (!isAtEnd()) {
    
    ... other statments
    
    }
  • The while loop continues parsing until the end of the token sequence reached (isAtEnd() function returns true).
bool Parser::isAtEnd()
{
    return peek().type == EOF_TOKEN;
}
Token Parser::peek()
{
// initially current is 0
    return tokens[current];
}

#2.2.3 Parse declaration

 Stmt *stmt  = declaration();
  • Within the loop, declaration() method is called to parse a single declaration statement. The returned pointer points to the parsed statement.
Stmt * Parser::declaration() {
    if (match(FUN)) {
        return (Stmt *)function("function"); // Parse a function declaration
    } else if (match(CLASS)) {
        return classDeclaration(); // Parse a class declaration
    } else if (match(VAR)) {
        return varDeclaration(); // Parse a variable declaration
    }
    return statement(); // Parse a generic statement if none of the above matched
}

We will break these in later part.

#2.2.4 Handle parsing errors

		if (FarmScript::hadError)
        {
            synchronize();
            if (stmt)
            {
                delete stmt;
            }
            stmt = nullptr;
            break;
        }
  • If an error occurred during parsing (FarmScript::hadError flag is set), the method attempts to synchronize parsing by calling synchronize() method.
  • If a statement was partially parsed before the error occurred, memory cleanup is performed by deleting the partially parsed statement and resetting the pointer.
  • The loop breaks if an error occurred to stop further parsing.

#2.2.5 Add parsed statement to result vector

		else
        {
            result.push_back(stmt);
        }
  • If no error occurred during parsing, the parsed statement pointer is added to the result vector.

#2.2.6 Return result vector

// Out of while loop
return result;
  • Finally, the method returns the vector containing the parsed statements.

#2.3 Tear Apart declaration()

Stmt * Parser::declaration() {
    if (match(FUN)) {
        return (Stmt *)function("function"); // Parse a function declaration
    } else if (match(CLASS)) {
        return classDeclaration(); // Parse a class declaration
    } else if (match(VAR)) {
        return varDeclaration(); // Parse a variable declaration
    }
    return statement(); // Parse a generic statement if none of the above matched
}
bool Parser::match(TokenType type)
{
    if (check(type))
    {
        advance();
        return true;
    }
    return false;
}
  • This function checks if the current token matches the specified type.
  • It first calls check(type) to verify if the current token's type matches the given type.
  • If there is a match, it advances to the next token using advance() and returns true.
  • If there is no match, it returns false.
bool Parser::check(TokenType type)
{
    if (isAtEnd())
    {
        return false;
    }
    return peek().type == type;
}
  • This function checks if the current token's type matches the specified type.
  • If the end of the token sequence is reached (isAtEnd() returns true), it returns false.
  • Otherwise, it compares the type of the current token (retrieved using peek().type) with the specified type and returns true if they match, otherwise false.
bool Parser::isAtEnd()
{
    return peek().type == EOF_TOKEN;
}
  • This function checks if the parser has reached the end of the token sequence.
  • It returns true if the type of the current token is EOF_TOKEN, indicating the end of the token sequence, and false otherwise.
Token Parser::advance()
{
    if (!isAtEnd())
    {
        return tokens[current++];
    }
    return previous();
}
  • This function advances to the next token in the token sequence.
  • It first checks if the parser is not at the end of the token sequence (!isAtEnd()).
  • If not at the end, it returns the current token using tokens[current++] and increments the current index to move to the next token.
  • If already at the end, it returns the previous token using previous().
Token Parser::previous()
{
    return tokens[current - 1];
}
  • This function returns the previous token in the token sequence.
  • It simply returns the token at index current - 1.
Token Parser::peek()
{
    return tokens[current];
}
  • This function returns the current token in the token sequence without advancing to the next one.
  • It returns the token at index current.

	if (match(FUN)) {
        return (Stmt *)function("function"); // Parse a function declaration
    }

It checks if the current token matches the FUN token (i.e., function keyword). It proceeds to parse a function declaration using the function method.

#2.3.1

void * Parser::function(std::string kind)
{
    Token *name = nullptr;
    Token *keyword = nullptr;
    if (kind == "class function")
    {
        consume(CLASS, "Expect class keyword.");
    }
    if (kind == "lambda")
    {
        keyword = new Token(previous());
    }
    else if (check(IDENTIFIER))
    {
        consume(IDENTIFIER, "Expect " + kind + " name.");
        name = new Token(previous());
    }
    if (consume(LEFT_PAREN, "Expect '(' after " + kind + " name."))
    {
        ListToken *params = new ListToken();
        if (!check(RIGHT_PAREN))
        {
            do
            {
                if (consume(IDENTIFIER, "Expect parameter name."))
                {
                    params->list.push_back(new Token(previous()));
                }
                else
                {
                    break;
                }
            } while (match(COMMA));
        }
        if (consume(RIGHT_PAREN, "Expect ')' after parameters."))
        {
            if (consume(LEFT_BRACE, "Expect '{' before " + kind + " body."))
            {
                Block *body = (Block *)blockStatement();
                if (kind == "lambda")
                {
                    return new Lambda(keyword, params, body);
                }
                else
                {   
                    return new Function(name, params, body);
                }
            }
        }
        delete params;
    }
    if (name)
    {
        delete name;
    }
    if (keyword)
    {
        delete keyword;
    }
    return nullptr;
}

Depending on the value of the kind parameter, the method determines whether to parse a class function, lambda expression, or regular function declaration.


 else if (match(CLASS))
    {
        return classDeclaration();
    }

It checks if the current token matches the CLASS token. If there's a match, it proceeds to parse a class declaration using the classDeclaration method and returns the result. 

classDeclaration():

Stmt * Parser::classDeclaration() {
    if (consume(IDENTIFIER, "Expect class name.")) {
        Token *name = new Token(previous()); // Parse and store the class name token
        Variable *superclass = nullptr; // Initialize superclass variable

        // Parse optional superclass declaration
        if (match(LESS)) {
            consume(IDENTIFIER, "Expect superclass name."); // Consume superclass name token
            superclass = new Variable(new Token(previous())); // Store superclass name token
        }

        // Parse class body
        if (consume(LEFT_BRACE, "Expect '{' before class body.")) {
            ListFunction *methods = new ListFunction(); // Initialize list for instance methods
            ListFunction *statics = new ListFunction(); // Initialize list for static methods
            bool declarationsOk = true; // Initialize flag for tracking declaration success

            // Parse class members until '}' or end of input
            while (!check(RIGHT_BRACE) && !isAtEnd()) {
                Function *fn; // Variable to store parsed function

                // Parse class function (static method) or method
                if (check(CLASS)) {
                    fn = (Function *)function("class function"); // Parse class function
                    if (fn) {
                        statics->list.push_back(fn); // Add parsed function to statics list
                    } else {
                        declarationsOk = false; // Set flag to false if parsing failed
                        break; // Exit loop
                    }
                } else {
                    fn = (Function *)function("method"); // Parse method
                    if (fn) {
                        methods->list.push_back(fn); // Add parsed function to methods list
                    } else {
                        declarationsOk = false; // Set flag to false if parsing failed
                        break; // Exit loop
                    }
                }
            }

            // Verify successful parsing of class body and consume '}'
            if (declarationsOk && consume(RIGHT_BRACE, "Expect '}' after class body.")) {
                return new Class(name, superclass, methods, statics); // Return Class object
            }

            delete methods; // Clean up methods list
            delete statics; // Clean up statics list
        }

        delete name; // Clean up name token
    }

    return nullptr; // Return null pointer if parsing failed
}

1 Parse Class Name:

if (consume(IDENTIFIER, "Expect class name.")) {
        Token *name = new Token(previous()); // Parse and store the class name token
        Variable *superclass = nullptr; // Initialize superclass variable

The method attempts to parse an identifier as the class name. If successful, it stores the class name token.

2 Parse Superclass (if present):

		if (match(LESS))
        {
            consume(IDENTIFIER, "Expect superclass name.");
            superclass = new Variable(new Token(previous()));
        }
  • If the next token is <, indicating a superclass declaration, the method attempts to parse the superclass name.

3 Parse Class Body:

		if (consume(LEFT_BRACE, "Expect '{' before class body."))
        {
            ListFunction *methods = new ListFunction();
            ListFunction *statics = new ListFunction();
            bool declarationsOk = true;
            while (!check(RIGHT_BRACE) && !isAtEnd())
            {
                Function *fn;
                if (check(CLASS))
                {
                    fn = (Function *)function("class function");
                    if (fn)
                    {
                        statics->list.push_back(fn);
                    }
                    else
                    {
                        declarationsOk = false;
                        break;
                    }
                }
                else
                {
                    fn = (Function *)function("method");
                    if (fn)
                    {
                        methods->list.push_back(fn);
                    }
                    else
                    {
                        declarationsOk = false;
                        break;
                    }
                }
            }
  • The method attempts to parse the class body enclosed in curly braces {}.
  • It initializes lists to store instance methods (methods) and static methods (statics).
  • It iterates through class members until encountering } or reaching the end of the input.
  • For each class member, it parses either a class function (static method) or a method and adds it to the appropriate list.
  • If parsing fails for any member, it sets a flag (declarationsOk) to indicate failure and breaks out of the loop.
  • After parsing all class members, it verifies successful parsing and consumes }.

4 Create Class Object:

		if (declarationsOk && consume(RIGHT_BRACE, "Expect '}' after class body."))
            {
                return new Class(name, superclass, methods, statics);
            }
  • If parsing of the class body is successful, it creates a Class object with the parsed class name, superclass (if any), and lists of methods.

 else if (match(VAR))
    {
        return varDeclaration();
    }

The varDeclaration() method is called when the parser encounters a VAR token during the parsing process. This indicates the declaration of a variable.

Stmt * Parser::varDeclaration() {
    if (consume(IDENTIFIER, "Expect variable name.")) {
        Token *name = new Token(previous()); // Parse and store the variable name token

        Expr *initializer = nullptr; // Initialize variable initializer

        // Parse optional initializer
        if (match(EQUAL)) {
            initializer = expression(); // Parse the expression for variable initialization
        }

        // Consume semicolon to end the variable declaration
        if (consume(SEMICOLON, "Expect ';' after variable declaration.")) {
            return new Var(name, initializer); // Return Var object representing the variable declaration
        }

        delete initializer; // Clean up initializer expression
        delete name; // Clean up name token
    }

    return nullptr; // Return null pointer if parsing failed
}

1 Parse Variable Name:

if (consume(IDENTIFIER, "Expect variable name.")) {
        Token *name = new Token(previous()); // Parse and store the variable name token
  • The method attempts to parse an identifier as the variable name. If successful, it stores the variable name token.
bool Parser::consume(TokenType type, std::string message) {
    // Check if the current token matches the specified type
    if (check(type)) {
        advance(); // Advance to the next token
        return true; // Return true indicating successful consumption
    }
    // If the current token does not match the specified type, generate an error message
    error(peek(), message);
    return false; // Return false indicating failure to consume
}
  • It checks if the current token matches the specified type. If there's a match, it advances to the next token and returns true. If there's no match, it generates an error message and returns false.

2 Parse Initializer (if present):

		Expr *initializer = nullptr; // Initialize variable initializer

        // Parse optional initializer
        if (match(EQUAL)) {
            initializer = expression(); // Parse the expression for variable initialization
        }
  • If the next token is =, indicating an initializer for the variable, the method parses the expression for variable initialization.

3 Consume Semicolon:

// Consume semicolon to end the variable declaration
        if (consume(SEMICOLON, "Expect ';' after variable declaration.")) {
            return new Var(name, initializer); // Return Var object representing the variable declaration
        }
  • The method expects a semicolon ; after the variable declaration to terminate it.
  • If the semicolon is found, indicating the end of the variable declaration, it creates a Var object with the parsed variable name and initializer (if any).

4 Memory Management:

 		delete initializer; // Clean up initializer expression
        delete name; // Clean up name token
  • It ensures proper memory management by deleting allocated tokens and expressions before returning.

5 Return nullptr if parsing failed:

return nullptr; // Return null pointer if parsing failed
  • It returns a pointer to the created Var object if parsing is successful, or a null pointer otherwise.

Var:

enum StmtType
{
    StmtType_Expression,
    StmtType_Print,
    StmtType_Var,
    StmtType_Block,
    StmtType_Function,
    StmtType_Class,
    StmtType_If,
    StmtType_While,
    StmtType_Return,
    StmtType_Break,
};

struct Var : public Stmt
{
    Token *name;
    Expr *initializer;

    Var(Token *name, Expr *initializer) : Stmt(StmtType_Var), name(name), initializer(initializer) {}

    ~Var();
};

struct Stmt
{
    StmtType type;
    Stmt(StmtType type) : type(type) {}
    virtual ~Stmt() {}
};
enum ExprType
{
    ExprType_Assign,
    ExprType_Ternary,
    ExprType_Binary,
    ExprType_Logical,
    ExprType_Grouping,
    ExprType_Literal,
    ExprType_Set,
    ExprType_Super,
    ExprType_This,
    ExprType_Unary,
    ExprType_Variable,
    ExprType_Call,
    ExprType_Get,
    ExprType_Lambda,
};

struct Expr
{
    ExprType type;
    Expr(ExprType type) : type(type) {}
    virtual ~Expr() {}
};