# Syntax Tree Representation
When interpreting or compiling code, it's helpful to represent the structure of the code in memory. We use syntax trees for this purpose. Just like a family tree, a syntax tree shows how different parts of the code relate to each other.
#1 Parsing
After tokenization, the next step in the compilation or interpretation process typically involves parsing. Parsing is the process of analyzing the stream of tokens produced during tokenization according to the rules of the language's grammar. This involves constructing a parse tree or an abstract syntax tree (AST) that represents the syntactic structure of the program.
Parsing:
- In parsing, the sequence of tokens is analyzed to ensure it conforms to the grammar of the programming language.
- The parser verifies whether the arrangement of tokens follows the syntactic rules of the language. It often uses a grammar specification, such as a context-free grammar (CFG), to guide this process.
- If the sequence of tokens is valid according to the grammar, the parser constructs a parse tree or AST that represents the structure of the program.
#2 Let's start Parsing
// Tokenization
Scanner scanner(source);
std::vector<Token> tokens = scanner.scanTokens();
// Parsing
Parser parser(tokens);
std::vector<Stmt *> statements = parser.parse();
#2.1 Constructor
Instantiate Parser Object
Parser::Parser(std::vector<Token> tokens)
{
this->tokens = tokens;
}
#2.2 Parse Tokens
std::vector<Stmt *> statements = parser.parse();
- The
parse
method of theparser
object is called to parse the sequence of tokens. - The result of parsing operation is stored in a vector named
statements
.
std::vector<Stmt *> Parser::parse() {
std::vector<Stmt *> result; // Initialize an empty vector to store parsed statements
// Continue parsing until the end of the token sequence is reached
while (!isAtEnd()) {
Stmt *stmt = declaration(); // Parse a single declaration statement
// Check if an error occurred during parsing
if (FarmScript::hadError) {
synchronize(); // Attempt to synchronize parsing by discarding tokens until a valid synchronization point is found
// Clean up memory and break out of the loop
if (stmt) {
delete stmt; // Delete the partially parsed statement
}
stmt = nullptr; // Reset the pointer
break; // Exit the loop
} else {
result.push_back(stmt); // Add the parsed statement to the result vector
}
}
return result; // Return the vector of parsed statements
}
Let's break this code:
#2.2.1 Initialize result vector
std::vector<Stmt *> result; // Initialize an empty vector to store parsed statements
- This initializes an empty vector
result
to store parsed statements.
#2.2.2 Loop until end of token sequence
// Continue parsing until the end of the token sequence is reached
while (!isAtEnd()) {
... other statments
}
- The
while
loop continues parsing until the end of the token sequence reached (isAtEnd()
function returnstrue
).
bool Parser::isAtEnd()
{
return peek().type == EOF_TOKEN;
}
Token Parser::peek()
{
// initially current is 0
return tokens[current];
}
#2.2.3 Parse declaration
Stmt *stmt = declaration();
- Within the loop,
declaration()
method is called to parse a single declaration statement. The returned pointer points to the parsed statement.
Stmt * Parser::declaration() {
if (match(FUN)) {
return (Stmt *)function("function"); // Parse a function declaration
} else if (match(CLASS)) {
return classDeclaration(); // Parse a class declaration
} else if (match(VAR)) {
return varDeclaration(); // Parse a variable declaration
}
return statement(); // Parse a generic statement if none of the above matched
}
We will break these in later part.
#2.2.4 Handle parsing errors
if (FarmScript::hadError)
{
synchronize();
if (stmt)
{
delete stmt;
}
stmt = nullptr;
break;
}
- If an error occurred during parsing (FarmScript::hadError flag is set), the method attempts to synchronize parsing by calling synchronize() method.
- If a statement was partially parsed before the error occurred, memory cleanup is performed by deleting the partially parsed statement and resetting the pointer.
- The loop breaks if an error occurred to stop further parsing.
#2.2.5 Add parsed statement to result vector
else
{
result.push_back(stmt);
}
- If no error occurred during parsing, the parsed statement pointer is added to the result vector.
#2.2.6 Return result vector
// Out of while loop
return result;
- Finally, the method returns the vector containing the parsed statements.
#2.3 Tear Apart declaration()
Stmt * Parser::declaration() {
if (match(FUN)) {
return (Stmt *)function("function"); // Parse a function declaration
} else if (match(CLASS)) {
return classDeclaration(); // Parse a class declaration
} else if (match(VAR)) {
return varDeclaration(); // Parse a variable declaration
}
return statement(); // Parse a generic statement if none of the above matched
}
bool Parser::match(TokenType type)
{
if (check(type))
{
advance();
return true;
}
return false;
}
- This function checks if the current token matches the specified
type
. - It first calls
check(type)
to verify if the current token's type matches the giventype
. - If there is a match, it advances to the next token using
advance()
and returnstrue
. - If there is no match, it returns
false
.
bool Parser::check(TokenType type)
{
if (isAtEnd())
{
return false;
}
return peek().type == type;
}
- This function checks if the current token's type matches the specified
type
. - If the end of the token sequence is reached (
isAtEnd()
returnstrue
), it returnsfalse
. - Otherwise, it compares the type of the current token (retrieved using
peek().type
) with the specifiedtype
and returnstrue
if they match, otherwisefalse
.
bool Parser::isAtEnd()
{
return peek().type == EOF_TOKEN;
}
- This function checks if the parser has reached the end of the token sequence.
- It returns
true
if the type of the current token isEOF_TOKEN
, indicating the end of the token sequence, andfalse
otherwise.
Token Parser::advance()
{
if (!isAtEnd())
{
return tokens[current++];
}
return previous();
}
- This function advances to the next token in the token sequence.
- It first checks if the parser is not at the end of the token sequence (
!isAtEnd()
). - If not at the end, it returns the current token using
tokens[current++]
and increments thecurrent
index to move to the next token. - If already at the end, it returns the previous token using
previous()
.
Token Parser::previous()
{
return tokens[current - 1];
}
- This function returns the previous token in the token sequence.
- It simply returns the token at index
current - 1
.
Token Parser::peek()
{
return tokens[current];
}
- This function returns the current token in the token sequence without advancing to the next one.
- It returns the token at index
current
.
if (match(FUN)) {
return (Stmt *)function("function"); // Parse a function declaration
}
It checks if the current token matches the FUN
token (i.e., function keyword). It proceeds to parse a function declaration using the function
method.
#2.3.1
void * Parser::function(std::string kind)
{
Token *name = nullptr;
Token *keyword = nullptr;
if (kind == "class function")
{
consume(CLASS, "Expect class keyword.");
}
if (kind == "lambda")
{
keyword = new Token(previous());
}
else if (check(IDENTIFIER))
{
consume(IDENTIFIER, "Expect " + kind + " name.");
name = new Token(previous());
}
if (consume(LEFT_PAREN, "Expect '(' after " + kind + " name."))
{
ListToken *params = new ListToken();
if (!check(RIGHT_PAREN))
{
do
{
if (consume(IDENTIFIER, "Expect parameter name."))
{
params->list.push_back(new Token(previous()));
}
else
{
break;
}
} while (match(COMMA));
}
if (consume(RIGHT_PAREN, "Expect ')' after parameters."))
{
if (consume(LEFT_BRACE, "Expect '{' before " + kind + " body."))
{
Block *body = (Block *)blockStatement();
if (kind == "lambda")
{
return new Lambda(keyword, params, body);
}
else
{
return new Function(name, params, body);
}
}
}
delete params;
}
if (name)
{
delete name;
}
if (keyword)
{
delete keyword;
}
return nullptr;
}
Depending on the value of the kind
parameter, the method determines whether to parse a class function, lambda expression, or regular function declaration.
else if (match(CLASS))
{
return classDeclaration();
}
It checks if the current token matches the CLASS token. If there's a match, it proceeds to parse a class declaration using the classDeclaration
method and returns the result.
classDeclaration():
Stmt * Parser::classDeclaration() {
if (consume(IDENTIFIER, "Expect class name.")) {
Token *name = new Token(previous()); // Parse and store the class name token
Variable *superclass = nullptr; // Initialize superclass variable
// Parse optional superclass declaration
if (match(LESS)) {
consume(IDENTIFIER, "Expect superclass name."); // Consume superclass name token
superclass = new Variable(new Token(previous())); // Store superclass name token
}
// Parse class body
if (consume(LEFT_BRACE, "Expect '{' before class body.")) {
ListFunction *methods = new ListFunction(); // Initialize list for instance methods
ListFunction *statics = new ListFunction(); // Initialize list for static methods
bool declarationsOk = true; // Initialize flag for tracking declaration success
// Parse class members until '}' or end of input
while (!check(RIGHT_BRACE) && !isAtEnd()) {
Function *fn; // Variable to store parsed function
// Parse class function (static method) or method
if (check(CLASS)) {
fn = (Function *)function("class function"); // Parse class function
if (fn) {
statics->list.push_back(fn); // Add parsed function to statics list
} else {
declarationsOk = false; // Set flag to false if parsing failed
break; // Exit loop
}
} else {
fn = (Function *)function("method"); // Parse method
if (fn) {
methods->list.push_back(fn); // Add parsed function to methods list
} else {
declarationsOk = false; // Set flag to false if parsing failed
break; // Exit loop
}
}
}
// Verify successful parsing of class body and consume '}'
if (declarationsOk && consume(RIGHT_BRACE, "Expect '}' after class body.")) {
return new Class(name, superclass, methods, statics); // Return Class object
}
delete methods; // Clean up methods list
delete statics; // Clean up statics list
}
delete name; // Clean up name token
}
return nullptr; // Return null pointer if parsing failed
}
1 Parse Class Name:
if (consume(IDENTIFIER, "Expect class name.")) {
Token *name = new Token(previous()); // Parse and store the class name token
Variable *superclass = nullptr; // Initialize superclass variable
The method attempts to parse an identifier as the class name. If successful, it stores the class name token.
2 Parse Superclass (if present):
if (match(LESS))
{
consume(IDENTIFIER, "Expect superclass name.");
superclass = new Variable(new Token(previous()));
}
- If the next token is
<
, indicating a superclass declaration, the method attempts to parse the superclass name.
3 Parse Class Body:
if (consume(LEFT_BRACE, "Expect '{' before class body."))
{
ListFunction *methods = new ListFunction();
ListFunction *statics = new ListFunction();
bool declarationsOk = true;
while (!check(RIGHT_BRACE) && !isAtEnd())
{
Function *fn;
if (check(CLASS))
{
fn = (Function *)function("class function");
if (fn)
{
statics->list.push_back(fn);
}
else
{
declarationsOk = false;
break;
}
}
else
{
fn = (Function *)function("method");
if (fn)
{
methods->list.push_back(fn);
}
else
{
declarationsOk = false;
break;
}
}
}
- The method attempts to parse the class body enclosed in curly braces
{}
. - It initializes lists to store instance methods (
methods
) and static methods (statics
). - It iterates through class members until encountering
}
or reaching the end of the input. - For each class member, it parses either a class function (static method) or a method and adds it to the appropriate list.
- If parsing fails for any member, it sets a flag (declarationsOk) to indicate failure and breaks out of the loop.
- After parsing all class members, it verifies successful parsing and consumes
}
.
4 Create Class Object:
if (declarationsOk && consume(RIGHT_BRACE, "Expect '}' after class body."))
{
return new Class(name, superclass, methods, statics);
}
- If parsing of the class body is successful, it creates a Class object with the parsed class name, superclass (if any), and lists of methods.
else if (match(VAR))
{
return varDeclaration();
}
The varDeclaration()
method is called when the parser encounters a VAR token during the parsing process. This indicates the declaration of a variable.
Stmt * Parser::varDeclaration() {
if (consume(IDENTIFIER, "Expect variable name.")) {
Token *name = new Token(previous()); // Parse and store the variable name token
Expr *initializer = nullptr; // Initialize variable initializer
// Parse optional initializer
if (match(EQUAL)) {
initializer = expression(); // Parse the expression for variable initialization
}
// Consume semicolon to end the variable declaration
if (consume(SEMICOLON, "Expect ';' after variable declaration.")) {
return new Var(name, initializer); // Return Var object representing the variable declaration
}
delete initializer; // Clean up initializer expression
delete name; // Clean up name token
}
return nullptr; // Return null pointer if parsing failed
}
1 Parse Variable Name:
if (consume(IDENTIFIER, "Expect variable name.")) {
Token *name = new Token(previous()); // Parse and store the variable name token
- The method attempts to parse an identifier as the variable name. If successful, it stores the variable name token.
bool Parser::consume(TokenType type, std::string message) {
// Check if the current token matches the specified type
if (check(type)) {
advance(); // Advance to the next token
return true; // Return true indicating successful consumption
}
// If the current token does not match the specified type, generate an error message
error(peek(), message);
return false; // Return false indicating failure to consume
}
- It checks if the current token matches the specified type. If there's a match, it advances to the next token and returns true. If there's no match, it generates an error message and returns false.
2 Parse Initializer (if present):
Expr *initializer = nullptr; // Initialize variable initializer
// Parse optional initializer
if (match(EQUAL)) {
initializer = expression(); // Parse the expression for variable initialization
}
- If the next token is
=
, indicating an initializer for the variable, the method parses the expression for variable initialization.
3 Consume Semicolon:
// Consume semicolon to end the variable declaration
if (consume(SEMICOLON, "Expect ';' after variable declaration.")) {
return new Var(name, initializer); // Return Var object representing the variable declaration
}
- The method expects a semicolon ; after the variable declaration to terminate it.
- If the semicolon is found, indicating the end of the variable declaration, it creates a
Var
object with the parsed variable name and initializer (if any).
4 Memory Management:
delete initializer; // Clean up initializer expression
delete name; // Clean up name token
- It ensures proper memory management by deleting allocated tokens and expressions before returning.
5 Return nullptr if parsing failed:
return nullptr; // Return null pointer if parsing failed
- It returns a pointer to the created Var object if parsing is successful, or a null pointer otherwise.
Var:
enum StmtType
{
StmtType_Expression,
StmtType_Print,
StmtType_Var,
StmtType_Block,
StmtType_Function,
StmtType_Class,
StmtType_If,
StmtType_While,
StmtType_Return,
StmtType_Break,
};
struct Var : public Stmt
{
Token *name;
Expr *initializer;
Var(Token *name, Expr *initializer) : Stmt(StmtType_Var), name(name), initializer(initializer) {}
~Var();
};
struct Stmt
{
StmtType type;
Stmt(StmtType type) : type(type) {}
virtual ~Stmt() {}
};
enum ExprType
{
ExprType_Assign,
ExprType_Ternary,
ExprType_Binary,
ExprType_Logical,
ExprType_Grouping,
ExprType_Literal,
ExprType_Set,
ExprType_Super,
ExprType_This,
ExprType_Unary,
ExprType_Variable,
ExprType_Call,
ExprType_Get,
ExprType_Lambda,
};
struct Expr
{
ExprType type;
Expr(ExprType type) : type(type) {}
virtual ~Expr() {}
};