#1 Function Calls
You’re certainly familiar with C-style function call syntax, but the grammar is more subtle than you may realize. Calls are typically to named functions like:
add(6, 7);
But the name of the function being called isn’t actually part of the call syntax. The thing being called—the callee—can be any expression that evaluates to a function. (Well, it does have to be a pretty high precedence expression, but parentheses take care of that.) For example:
getCallback()();
There are two call expressions here. The first pair of parentheses has getCallback as its callee. But the second call has the entire getCallback() expression as its callee. It is the parentheses following an expression that indicate a function call. You can think of a call as sort of like a postfix operator that starts with (.
This “operator” has higher precedence than any other operator, even the unary ones. So we slot it into the grammar by having the unary rule bubble up to a new call rule.
unary → ( "!" | "-" ) unary | call ;
call → primary ( "(" arguments? ")" )* ;
This rule matches a primary expression followed by zero or more function calls. If there are no parentheses, this parses a bare primary expression. Otherwise, each call is recognized by a pair of parentheses with an optional list of arguments inside. The argument list grammar is:
arguments → expression ( "," expression )* ;
This rule requires at least one argument expression, followed by zero or more other expressions, each preceded by a comma. To handle zero-argument calls, the call rule itself considers the entire arguments production to be optional.
Open the parser code. Where unary()
used to jump straight to primary()
, change it to call, call()
:
Expr * Parser::unary()
{
if (match(MINUS) || match(BANG) || match(STAR) || match(SLASH) || match(PLUS))
{
if ((previous().type == MINUS) || (previous().type == BANG))
{
Token *oper = new Token(previous());
Expr *right = unary();
return new Unary(oper, right);
}
else
{
error(previous(), "Missing left operand.");
return nullptr;
}
}
return call();
}
It's definition is:
Expr Parser::call() {
Expr expr = primary();
while (true) {
if (match(LEFT_PAREN)) {
expr = finishCall(expr);
} else {
break;
}
}
return expr;
}
The code here doesn’t quite line up with the grammar rules. I moved a few things around to make the code cleaner—one of the luxuries we have with a handwritten parser. But it’s roughly similar to how we parse infix operators. First, we parse a primary expression, the “left operand” to the call. Then, each time we see a (, we call finishCall() to parse the call expression using the previously parsed expression as the callee. The returned expression becomes the new expr and we loop to see if the result is itself called.
The code to parse the argument list is in this helper:
Expr * Parser::finishCall(Expr *callee)
{
ListExpr *arguments = new ListExpr();
if (!check(RIGHT_PAREN))
{
do
{
if (arguments->list.size() >= 255)
{
error(peek(), "Can't have more than 255 arguments.");
}
arguments->list.push_back(expression());
} while (match(COMMA));
}
consume(RIGHT_PAREN, "Expected ')' after arguments.");
return new Call(callee, new Token(previous()), arguments);
}
This is more or less the arguments grammar rule translated to code, except that we also handle the zero-argument case. We check for that case first by seeing if the next token is ). If it is, we don’t try to parse any arguments.
Otherwise, we parse an expression, then look for a comma indicating that there is another argument after that. We keep doing that as long as we find commas after each expression. When we don’t find a comma, then the argument list must be done and we consume the expected closing parenthesis. Finally, we wrap the callee and those arguments up into a call AST node.
#1.1 Maximum arguments counts
Right now, the loop where we parse arguments has no bound. If you want to call a function and pass a million arguments to it, the parser would have no problem with it. Do we want to limit that?
Other languages have various approaches. The C standard says a conforming implementation has to support at least 127 arguments to a function, but doesn’t say there’s any upper limit. The Java specification says a method can accept no more than 255 arguments.
Our C++ interpreter for FarmScript doesn’t really need a limit, but having a maximum number of arguments will simplify our bytecode interpreter in Part III. We want our two interpreters to be compatible with each other, even in weird corner cases like this, so we’ll add the same limit to FarmScript.
do{
if (arguments.size() >= 255) {
error(peek(), "Can't have more than 255 arguments.");
}
This code is part of the above snippet.
Note that the code here reports an error if it encounters too many arguments, but it doesn’t throw the error. Throwing is how we kick into panic mode which is what we want if the parser is in a confused state and doesn’t know where it is in the grammar anymore. But here, the parser is still in a perfectly valid state—it just found too many arguments. So it reports the error and keeps on keepin’ on.
#1.2 Interpreting function calls
As always, interpretation starts with a new visit method for our new call expression node.
Object Interpreter::visitCall(Call *stmt)
{
Object callee = evaluate(stmt->callee);
if (FarmScript::hadRuntimeError)
{
return callee;
}
std::vector<Object> arguments;
for (auto &argument : stmt->arguments->list)
{
arguments.push_back(evaluate(argument));
}
if ((callee.type != TYPE_FUNCTION) && (callee.type != TYPE_CLASS))
{
runtimeError(*stmt->paren, "Can only call functions and methods.");
}
else
{
LoxCallable *callable = callee.type == TYPE_CLASS ? callee.loxClass : callee.function;
if (callable->arity() != arguments.size())
{
std::stringstream ss;
ss << "Expected " << callable->arity() << " arguments but got " << arguments.size() << ".";
runtimeError(*stmt->paren, ss.str());
return callee;
}
else
{
Object result = callable->call(this, arguments);
returnSet = false;
return result;
}
}
return callee;
}
First, we evaluate the expression for the callee. Typically, this expression is just an identifier that looks up the function by its name, but it could be anything. Then we evaluate each of the argument expressions in order and store the resulting values in a list.
Once we’ve got the callee and the arguments ready, all that remains is to perform the call. We do that by casting the callee to a LoxCallable and then invoking a call() method on it. The Java representation of any Lox object that can be called like a function will implement this interface. That includes user-defined functions, naturally, but also class objects since classes are “called” to construct new instances. We’ll also use it for one more purpose shortly.
There is a new interface:
class LoxCallable
{
public:
virtual int arity() = 0;
virtual Object call(Interpreter *interpreter, std::vector<Object>) = 0;
virtual std::string str() = 0;
};
We pass in the interpreter in case the class implementing call()
needs it. We also give it the list of evaluated argument values. The implementer's job is then to return the value that the call expression produces.
#1.3 Call type errors
Before we get to implementing LoxCallable, we need to make the visit method a little more robust. It currently ignores a couple of failure modes that we can’t pretend won’t occur. First, what happens if the callee isn’t actually something you can call? What if you try to do this:
"totally not a function"();
Strings aren’t callable in Lox. The runtime representation of a Lox string is a C++ string, so when we cast that to LoxCallable
, the JVM will throw a ClassCastException
. We don’t want our interpreter to vomit out some nasty Java stack trace and die. Instead, we need to check the type ourselves first
if ((callee.type != TYPE_FUNCTION))
{
runtimeError(*stmt->paren, "Can only call functions and methods.");
}
We still throw an exception, but now we’re throwing our own exception type, one that the interpreter knows to catch and report gracefully.
#1.4 Checking arity
The other problem relates to the function’s arity. Arity is the fancy term for the number of arguments a function or operation expects. Unary operators have arity one, binary operators two, etc. With functions, the arity is determined by the number of parameters it declares.
fun add(a, b, c) {
print a + b + c;
}
This function defines three parameters, a, b, and c, so its arity is three and it expects three arguments. So what if you try to call it like this:
add(1, 2, 3, 4); // Too many.
add(1, 2); // Too few.
Different languages take different approaches to this problem. Of course, most statically typed languages check this at compile time and refuse to compile the code if the argument count doesn’t match the function’s arity. JavaScript discards any extra arguments you pass. If you don’t pass enough, it fills in the missing parameters with the magic sort-of-like-null-but-not-really value undefined. Python is stricter. It raises a runtime error if the argument list is too short or too long.
I think the latter is a better approach. Passing the wrong number of arguments is almost always a bug, and it’s a mistake I do make in practice. Given that, the sooner the implementation draws my attention to it, the better. So for FarmScript, we’ll take Python’s approach. Before invoking the callable, we check to see if the argument list’s length matches the callable’s arity.
LoxCallable *callable = callee.type == TYPE_CLASS ? callee.loxClass : callee.function;
if (callable->arity() != arguments.size())
{
std::stringstream ss;
ss << "Expected " << callable->arity() << " arguments but got " << arguments.size() << ".";
runtimeError(*stmt->paren, ss.str());
return callee;
}
That requires a new method on the LoxCallable interface to ask it its arity:
class LoxCallable
{
public:
virtual int arity() = 0;
We could push the arity checking into the concrete implementation of call(). But, since we’ll have multiple classes implementing LoxCallable, that would end up with redundant validation spread across a few classes. Hoisting it up into the visit method lets us do it in one place.
#2 Native Functions
We can theoretically call functions, but we have no functions to call yet. Before we get to user-defined functions, now is a good time to introduce a vital but often overlooked facet of language implementations—native functions. These are functions that the interpreter exposes to user code but that are implemented in the host language (in our case Java), not the language being implemented (FarmScript).
Sometimes these are called primitives, external functions, or foreign functions. Since these functions can be called while the user’s program is running, they form part of the implementation’s runtime. A lot of programming language books gloss over these because they aren’t conceptually interesting. They’re mostly grunt work.
Many languages also allow users to provide their own native functions. The mechanism for doing so is called a foreign function interface (FFI), native extension, native interface, or something along those lines. These are nice because they free the language implementer from providing access to every single capability the underlying platform supports. We won’t define an FFI for jlox, but we will add one native function to give you an idea of what it looks like.
#2.1 Telling time
Performance work requires measurement, and that in turn means benchmarks. These are programs that measure the time it takes to exercise some corner of the interpreter.
We could measure the time it takes to start up the interpreter, run the benchmark, and exit, but that adds a lot of overhead—JVM startup time, OS shenanigans, etc. That stuff does matter, of course, but if you’re just trying to validate an optimization to some piece of the interpreter, you don’t want that overhead obscuring your results.
A nicer solution is to have the benchmark script itself measure the time elapsed between two points in the code. To do that, a Lox program needs to be able to tell time. There’s no way to do that now—you can’t implement a useful clock “from scratch” without access to the underlying clock on the computer.
So we’ll add clock(), a native function that returns the number of seconds that have passed since some fixed point in time. The difference between two successive invocations tells you how much time elapsed between the two calls. This function is defined in the global scope, so let’s ensure the interpreter has access to that.
Object clock(new ClockFunction());
environment->define("clock", clock);
This defines a variable named clock
. The clock()
function takes no arguments, so its arity is zero.
class ClockFunction : public LoxCallable
{
virtual int arity() override
{
return 0;
}
virtual Object call(Interpreter *interpreter, std::vector<Object>) override
{
auto millisec_since_epoch = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
Object value((double)millisec_since_epoch / 1000.0);
return value;
}
virtual std::string str() override
{
return "<native fn>";
}
};
The implementation of call()
calls the corresponding c++ function and converts the result to a double value in seconds.
If we wanted to add other native functions—reading input from the user, working with files, etc.—we could add them each as their own anonymous class that implements LoxCallable. But for the book, this one is really all we need.
Let’s get ourselves out of the function-defining business and let our users take over . . .
#3 Function Declarations
We finally get to add a new production to the declaration rule we introduced back when we added variables. Function declarations, like variables, bind a new name. That means they are allowed only in places where a declaration is permitted.
A named function declaration isn’t really a single primitive operation. It’s syntactic sugar for two distinct steps: (1) creating a new function object, and (2) binding a new variable to it. If Lox had syntax for anonymous functions, we wouldn’t need function declaration statements. You could just do:
var add = fun (a, b) {
print a + b;
};
However, since named functions are the common case, I went ahead and gave Lox nice syntax for them.
declaration → funDecl
| varDecl
| statement ;
The updated declaration rule references this new rule:
funDecl → "fun" function ;
function → IDENTIFIER "(" parameters? ")" block ;
The main funDecl rule uses a separate helper rule function. A function declaration statement is the fun keyword followed by the actual function-y stuff. When we get to classes, we’ll reuse that function rule for declaring methods. Those look similar to function declarations, but aren’t preceded by fun.
The function itself is a name followed by the parenthesized parameter list and the body. The body is always a braced block, using the same grammar rule that block statements use. The parameter list uses this rule:
parameters → IDENTIFIER ( "," IDENTIFIER )* ;
A function node has a name, a list of parameters (their names), and then the body. We store the body as the list of statements contained inside the curly braces.
Over in the parser, we weave in the new declaration.
Stmt * Parser::declaration()
{
if (match(FUN))
{
return (Stmt *)function("function");
}
.../ other statements
Like other statements, a function is recognized by the leading keyword. When we encounter fun, we call function. That corresponds to the function grammar rule since we already matched and consumed the fun keyword. We’ll build the method up a piece at a time, starting with this:
void * Parser::function(std::string kind)
{
consume(IDENTIFIER, "Expect " + kind + " name.");
name = new Token(previous());
}
Right now, it only consumes the identifier token for the function’s name. You might be wondering about that funny little kind parameter. Just like we reuse the grammar rule, we’ll reuse the function() method later to parse methods inside classes. When we do that, we’ll pass in “method” for kind so that the error messages are specific to the kind of declaration being parsed.
Next, we parse the parameter list and the pair of parentheses wrapped around it.
void * Parser::function(std::string kind)
{
Token *name = nullptr;
Token *keyword = nullptr;
if (kind == "class function")
{
consume(CLASS, "Expect class keyword.");
}
if (kind == "lambda")
{
keyword = new Token(previous());
}
else if (check(IDENTIFIER))
{
consume(IDENTIFIER, "Expect " + kind + " name.");
name = new Token(previous());
}
if (consume(LEFT_PAREN, "Expect '(' after " + kind + " name."))
{
ListToken *params = new ListToken();
if (!check(RIGHT_PAREN))
{
do
{
if (consume(IDENTIFIER, "Expect parameter name."))
{
params->list.push_back(new Token(previous()));
}
else
{
break;
}
} while (match(COMMA));
}
if (consume(RIGHT_PAREN, "Expect ')' after parameters."))
{
if (consume(LEFT_BRACE, "Expect '{' before " + kind + " body."))
{
Block *body = (Block *)blockStatement();
if (kind == "lambda")
{
return new Lambda(keyword, params, body);
}
else
{
return new Function(name, params, body);
}
}
}
delete params;
}
if (name)
{
delete name;
}
if (keyword)
{
delete keyword;
}
return nullptr;
}