An interpreter doesn't generate code. It performs semantic actions as it parses the input. For expressions, it uses a value stack to hold the expression operands and their types.
The value stack stores intermediate results of parsing for later use in a grammar rule. Here's an example using a simple translation scheme with semantic actions inserted in their proper places:
<assign> --> ID {eval;push} '=' <plus> {pop;pop;enter;copy} ';'
<plus> --> <xpress> '+' <xpress> {pop;pop;add if numeric;push}
<xpress> --> ID {eval;lookup;push} | CONST {eval;push}This grammar needs a symbol table to hold the current values of identifiers. Here are short explanations of what each of the pseudocode semantic action symbols means:
{eval}: get the value and type of the token (get the string or the number value)
{push}: push the value and type on the value stack
{pop}: pop the top value and type off the value stack
{lookup}: find the identifier in the symbol table; get its current value and type from the table
{enter}: enter the identifier string into the symbol table if it isn't already there; return a pointer
{copy}: copy the value and type to the symbol table at the location of the {enter} pointer
{add if numeric}: if they are both numbers, add the two popped valuesHere is a more detailed explanation of what each of the pseudocode actions does, starting from the leaves of the grammar and working back up to the start symbol:
{eval;lookup;push}: When the parsing function implementing xpress() recognizes an ID token used in an expression it does these things:
- it gets the string that is the actual identifier name from the ID token (token.lexeme)
- it looks up the name of the identifier in the symbol table using sym_lookup()
- it uses sym_value() to get the current type and value of the identifier from the symbol table
- it copies and pushes the type and value onto the value stack
An undefined identifier -- one not found in the symbol table -- would generate a run-time error message. The value and type of the pushed operand is the value and type from the symbol table, not the value and type of the identifier token itself.
{eval;push}: When the parsing function implementing xpress() recognizes a CONST constant, it gets the numeric value of the constant from the token and pushes the value and type of the constant onto the value stack. When the parsing function implementing assign() recognizes an ID identifier to the left of an equals sign, it gets the string that is the name of the identifier from the token and pushes the name and type (type: ID) on the value stack. (Identifiers on the right of equals signs are treated differently.)
{pop;pop;add if numeric;push}: The parsing function implementing plus() first parses two xpress()s. As noted above, each call to xpress() pushes a value and type on the value stack. After both xpress()s have been pushed, plus() pops each one off, verifies that they are compatible for an "add" operation, adds them together, and pushes the result and type back on the value stack. The actual code to do this is usually coded in a separate execute(operator, lineno) function that handles most of the value stack operations.
{pop;pop;enter;copy}: By the time this semantic action is seen, two operands have already been pushed on the stack. The first operand pushed is the string name of an identifier that begins the assignment statement; the second operand pushed is the plus-value resulting from the plus() operation. We pop the stack twice to get the identifier and the plus-value. We look up the string that is the identifier in the symbol table using sym_lookup(). If the identifier doesn't exist, we create a new entry in the symbol table using sym_insert() and return a pointer to it. If we do find the identifier in the symbol table, we return a pointer to the existing entry. Using the pointer, we call a symbol-table update function sym_update() to copy the plus-value and its type into the symbol table at the location indicated by the pointer.
Here is one of the augmented grammar productions that make up a translation scheme, with the semantic action symbols inserted as before:
<assign> --> ID {eval;push} '=' <xpress> {pop;pop;enter;copy} ';'
The code for the recursive-descent parsing function that implements the semantic actions for this production looks something like this:
Boolean assign(){
IF tokentype is not ID THEN
print a good syntax error message
RETURN FALSE
ENDIF
copy and push the ID token on the value stack
lineno = the line number of this token
get the next lookahead token
CALL match(EQUALS), return FALSE if it fails
CALL xpress(), return FALSE if it fails
CALL execute(EQUALS,lineno), return FALSE if it fails
CALL match(SEMICOLON), return FALSE if it fails
RETURN TRUE -- parsing succeeded
}The execute() function that handles most value stack operations must contain a case that handles the assignment operation:
Symtab *symp; -- pointer to a symbol table entry CASE EQUALS: -- an assignment operation, e.g. a=<xpress> op2 = pop() -- op2 (the value to assign) was pushed last op1 = pop() -- op1 (the identifier name) was pushed first IF op1.type is not an identifier THEN -- issue a good error message (should not happen!) abort the program ENDIF -- find the ID in the table, or, create a new entry for the ID: symp = sym_lookup(op1.lexeme) -- see if ID is in table IF symp is NULL THEN symp = sym_insert(op1.lexeme) -- create new entry IF symp is NULL THEN -- error: must be full symbol table (error msg was printed) free both operands popped off stack RETURN FALSE ENDIF ENDIF symp = sym_update(symp,&op2) -- update the entry IF symp is NULL THEN -- types are incompatible for update (error msg was printed) free both operands popped off stack RETURN FALSE ENDIF free both operands popped off stack RETURN TRUE -- parsing succeeded