Semantic Actions for Assignment Statements

This page last updated: Sunday September 27, 1998 01:07

The Interpreter Value Stack

An interpreter doesn't generate code. It performs semantic actions as it parses the input. For expressions, it uses a value stack to hold the expression operands and their types.

The value stack stores intermediate results of parsing for later use in a grammar rule. Here's an example using a simple translation scheme with semantic actions inserted in their proper places:
<assign> --> ID {eval;push} '=' <plus> {pop;pop;enter;copy} ';' <plus> --> <xpress> '+' <xpress> {pop;pop;add if numeric;push} <xpress> --> ID {eval;lookup;push} | CONST {eval;push}
This grammar needs a symbol table to hold the current values of identifiers. Here are short explanations of what each of the pseudocode semantic action symbols means:

{eval}: get the value and type of the token (get the string or the number value)
{push}: push the value and type on the value stack
{pop}: pop the top value and type off the value stack
{lookup}: find the identifier in the symbol table; get its current value and type from the table
{enter}: enter the identifier string into the symbol table if it isn't already there; return a pointer
{copy}: copy the value and type to the symbol table at the location of the {enter} pointer
{add if numeric}: if they are both numbers, add the two popped values

Here is a more detailed explanation of what each of the pseudocode actions does, starting from the leaves of the grammar and working back up to the start symbol:

{eval;lookup;push}: When the parsing function implementing xpress() recognizes an ID token used in an expression it does these things:

it gets the string that is the actual identifier name from the ID token (token.lexeme)

it looks up the name of the identifier in the symbol table using sym_lookup()

it uses sym_value() to get the current type and value of the identifier from the symbol table

it copies and pushes the type and value onto the value stack

An undefined identifier -- one not found in the symbol table -- would generate a run-time error message. The value and type of the pushed operand is the value and type from the symbol table, not the value and type of the identifier token itself.

{eval;push}: When the parsing function implementing xpress() recognizes a CONST constant, it gets the numeric value of the constant from the token and pushes the value and type of the constant onto the value stack. When the parsing function implementing assign() recognizes an ID identifier to the left of an equals sign, it gets the string that is the name of the identifier from the token and pushes the name and type (type: ID) on the value stack. (Identifiers on the right of equals signs are treated differently.)

{pop;pop;add if numeric;push}: The parsing function implementing plus() first parses two xpress()s. As noted above, each call to xpress() pushes a value and type on the value stack. After both xpress()s have been pushed, plus() pops each one off, verifies that they are compatible for an "add" operation, adds them together, and pushes the result and type back on the value stack. The actual code to do this is usually coded in a separate execute(operator, lineno) function that handles most of the value stack operations.

{pop;pop;enter;copy}: By the time this semantic action is seen, two operands have already been pushed on the stack. The first operand pushed is the string name of an identifier that begins the assignment statement; the second operand pushed is the plus-value resulting from the plus() operation. We pop the stack twice to get the identifier and the plus-value. We look up the string that is the identifier in the symbol table using sym_lookup(). If the identifier doesn't exist, we create a new entry in the symbol table using sym_insert() and return a pointer to it. If we do find the identifier in the symbol table, we return a pointer to the existing entry. Using the pointer, we call a symbol-table update function sym_update() to copy the plus-value and its type into the symbol table at the location indicated by the pointer.

Assignment Statements

Here is one of the augmented grammar productions that make up a translation scheme, with the semantic action symbols inserted as before:
<assign> --> ID {eval;push} '=' <xpress> {pop;pop;enter;copy} ';'
The code for the recursive-descent parsing function that implements the semantic actions for this production looks something like this:

Boolean assign(){ IF tokentype is not ID THEN print a good syntax error message RETURN FALSE ENDIF copy and push the ID token on the value stack lineno = the line number of this token get the next lookahead token CALL match(EQUALS), return FALSE if it fails CALL xpress(), return FALSE if it fails CALL execute(EQUALS,lineno), return FALSE if it fails CALL match(SEMICOLON), return FALSE if it fails RETURN TRUE -- parsing succeeded }

The execute() function that handles most value stack operations must contain a case that handles the assignment operation:
   Symtab *symp;         -- pointer to a symbol table entry
  
   CASE EQUALS:          -- an assignment operation, e.g. a=<xpress>
      op2 = pop()        -- op2 (the value to assign) was pushed last
      op1 = pop()        -- op1 (the identifier name) was pushed first

      IF op1.type is not an identifier THEN
         -- issue a good error message (should not happen!)
         abort the program
      ENDIF
      
      -- find the ID in the table, or, create a new entry for the ID:   
      symp = sym_lookup(op1.lexeme)          -- see if ID is in table
      IF symp is NULL THEN
         symp = sym_insert(op1.lexeme)       -- create new entry
         IF symp is NULL THEN
            -- error: must be full symbol table (error msg was printed)
            free both operands popped off stack
            RETURN FALSE
         ENDIF
      ENDIF
         
      symp = sym_update(symp,&op2)           -- update the entry
      IF symp is NULL THEN
         -- types are incompatible for update (error msg was printed)
         free both operands popped off stack
         RETURN FALSE
      ENDIF        
   
      free both operands popped off stack

      RETURN TRUE                            -- parsing succeeded