04.24.2022 - Compilers/Virtual Stack Machine/Variable declaration
Posted On 04.24.2022
In the previous article, we get an overview of how bytecode is executed in a stack machine.
In this article, we continue to dive into the details of a different scenario: Variable declaration.
In the simple language that we are designing, there will be 3 types of variable declaration:
- Variable with a constant value
- Variable with a value from other variables
- Variable with a value from an expression
In the following program, we have all 3 declaration types above:
var x = 10 // type 1 var y = x // type 2 var z = 15 + x // type 3
Let’s take a closer look at each case to see how would the bytecode compiler compiles them.
Aside from the data stack, a stack machine needs to store other information that will be used for the execution, different implementations have different ways to store them, but generally, we can think of them as the 3 tables:
- Constants: to store the literal values in the program, for example, numbers or strings.
- Symbols: to store the variable names that are defined in the program.
- Locals: to map the symbols with their actual values.
To work with these tables, the instruction set usually contains these opcodes (these names are inspired by CPython’s opcodes):
LOAD_CONST: Load a value from the Constants table, and put it on the top of the stack. It’s somewhat similar to the
PUSHinstruction that was introduced in the previous article.
LOAD_NAME: Load the referenced value of a symbol in the Locals table, and put it on the top of the stack.
STORE_NAME: Store the address of the value on the top of the stack in the Locals table, with the key being the address to the symbol name.
Now, we are ready to dive into each variable declaration type.
This is the case where a new variable is declared with a constant value. During the compilation, the compiler will put the constant
10 into the Constants table (for example, at the location
0x01a), and put the name
x into the Symbols table at the location
Then, the expression
var x = 10 will be transformed into two instructions. The first one is
LOAD_CONST 0x01a, which loads the constant at location
0x01a into the top of the stack, which is the number
10. Then, the
STORE_NAME 0x01b instruction map the address of the value on the top of the stack with the symbol located at
0x01b — which is the symbol
Another scenario is when we define a new variable, and assign it with the value from another variable. For example,
var y = x. Same as above, during the compilation, the symbol
y was also created in the Symbols table, at the location
Two instructions will be generated. The first one,
LOAD_NAME 0x01b, will look at the Locals table, retrieve the referenced value mapped with the symbol at location
0x01b — which is the value
10 (at location
0x01c on the stack), put it to the top of the stack. Then, the second instruction,
STORE_NAME 0x02b, will map the location of the top of the stack with the symbol at
0x02b and save it to the Locals.
Next, consider the case when a variable is declared and assigned to an expression, the expression may or may not contains another variable. For example,
var z = 15 + x.
Remember that, all bytecode is generated in the way we write Reversed Polish Notation, the above statement would become 15 x +. And it would be transformed into a series of instructions as described in the above diagram.
LOAD_CONST 0x02a will load the constant at the location
0x02a into the top of the stack (the value
15). Next, the
LOAD_NAME 0x01b instruction load the value of the symbol at
0x01b from the Locals table (the value
10 at the location
0x01c), put it on the top of the stack. Then, the
ADD instruction will pop the top two values on the stack, add them together, and push back the result into the stack — the value
25 at the location
STORE_NAME 0x03b instruction saves the location of the top of the stack into the Locals table, with the name of the symbol at
In the above examples, we see that when a variable is created, the assigned values are copied from other variables. In reality, we may need to create a pointer that referenced some other memory location. This is a much more complex case that requires us to handle many different aspects of the language, so it will be a topic for another article.