CST 8152 - Semantic Action Coding

Here is a skeleton for coding semantic actions in C. In some cases, the functions return a pointer to an Item structure containing a type and union value. This is a convenient way to return both an error code (in the type field) and an error message (in the str field of the union) at the same time. The parser can check the error code and print the error message if needed.

typedef struct {
   int type;                /* an enum for the type of the item */
   Boolean freememory;      /* TRUE if the item is dynamic memory */
   union {
      char *str;
      int num;
   } value;
} Item;

Managing dynamic memory

If your parser uses dynamic memory, it must be careful about who "owns" the memory. My recommendations are these:

Pushing a string

At the lowest level of the grammar, string tokens get returned by the scanner and are pushed on the semantic expression value stack. Since the string token has to be scanned before it can be pushed, the semantic action for the push appears after the operand is scanned.

A typical augmented grammar production (with semantic action added) involving a string push might look like this:

<factor> --> STRING {push} | . . .

The code for this part of the production might look something like this:

factor(){
   ... declarations here ...
   if( token.type == STRING ){
      Item tmp;
      tmp.type = token.type;
      tmp.value.str = strdup(token.value.str);
      push(&tmp);
      scanner();
      return;
   }
   . . . the rest of factor() goes here . . .
}

Note: The above code is for illustration purposes only. The strdup() isn't being checked for a NULL pointer return (write a cover function mystrdup() that does the check), and the return codes of the various functions are not being checked. Code for a working parser would not have these defects.

The Assignment Statement

A typical augmented grammar production (with semantic action added) expressing an assignment statement might look like this:

<assignment_statement> --> ID {enter} '=' <expression> {pop;copy} ';' 

First, here are the prototypes and specifications for some utility functions used below:

Item *compatible(Item *op1, Item *op2, OpType operator):
Given a particular operator, such as T_EQUALS, T_PLUS, T_MINUS, etc., and two operands, op1 and op2, do operand type checking to see if the given operation is permitted between the two operands. Return a good status E_OK if the operation is permitted. Return a warning status E_WARN and warning message if the operation is questionable but permitted. Return an error status E_ERROR and error message if the operation is not permitted. E_ERROR should be returned for attempts to use arithmetic on strings. E_WARN should be returned when assigning a number to a string variable, or a string to a numeric variable. The function manages its own memory.
void freememory(Item *itemp):
If the given item is of a type that contains memory that should be freed, free it; otherwise, do nothing. (Number types do not need to be freed.)

Sample code for the above assignment statement follows. It uses functions that interface with the symbol table.

   Boolean
assignment_statement(){
   SymtabElt *symlocation;   /* pointer to a symbol table element */
   Item popvalue;            /* type/value union from stack */
   Item *msg;                /* type/value union pointer from compatible */
   Item *symvalue;           /* type/value union pointer from symtab */

   /* Expect an identifier */
   if( token.type != ID ){
      errprint("missing ID");
      return FALSE;
   }

   /* Found an ID; look it up.  Create a new symtab entry if not found */
   symlocation = sym_lookup(token.value.str);
   if( symlocation == NULL ){
      symlocation = sym_insert(token.value.str);  /* new, empty entry */
      if( symlocation == NULL ){
         errprint("symbol table full");
         return FALSE;
      }
   }
   lex_gettoken();

   /* Expect an '=' followed by an expression */
   if( token.type != EQUALS ){
      errprint("missing '='");
      return FALSE;
   }
   lex_gettoken();
   if( ! expression() )
      return FALSE;

   /* Pop the value of the expression.  See if it is compatible with the type
    * of the variable to which we are assigning.  Print a warning if not.
    * If the incompatibility is fatal, free the popped storage and give up.
    */
   popvalue = pop();
   symvalue = sym_item(symlocation);
   if( (msg=compatible(symvalue,&popvalue,T_EQUALS))->type != E_OK ){
      errprint(msg->value.str);      /* issue the returned warning message */
      if( msg->type == E_ERROR ){
         freememory(&popvalue);
         return FALSE;              /* cannot do this assignment */
      }
   }

   /* assignment is compatible; call a function to do it */
   symlocation = sym_update(symlocation,&popvalue);
   if( symlocation == NULL ){
      errprint("unable to update symbol table");
      freememory(&popvalue);
      return FALSE;
   }
   
   /* all done; free the popped item and finish the statement */
   freememory(&popvalue);
   if( token.type != SEMI ){
      errprint("missing ';'");
      return FALSE;
   }
   lex_gettoken();

   return TRUE; /* parsing succeeded */
}

Note: The above code is for illustration purposes only. The error messages aren't sufficiently detailed and don't contain any helpful information such as line number or the lexeme value of the offending tokens. A proper parser would be more helpful.

Performing Unary Minus

The semantic action for Unary Minus in an interpreted grammar is to pop the value of the operand off the semantic expression value stack, negate it, then push it back. Since the operand has to be evaluated first, before it can be negated, the semantic action for negation appears after the operand is parsed.

A typical augmented grammar production (with semantic action added) involving unary minus might look like this:

<factor> --> '-' <factor> {pop;negate;push} | . . .

The code for this part of the production might look something like this:

factor(){
   ... declarations here ...
   if( token.type == MINUS ){
      scanner();
      factor();
      popvalue = pop();
      popvalue.value.num = - popvalue.value.num;
      push(&popvalue);
      return;
   }
   . . . the rest of factor() goes here . . .
}

Note: The above code is for illustration purposes only. The type of the popped operand isn't being checked for compatibility with a negation operation (the popped operand might be a string, not a number!), and the return codes of the various functions are not being checked, nor is memory being freed properly. Code for a working parser would not have these defects.

Performing Arithmetic operations

Some productions require arithmetic to be performed on operands pop()ped off the semantic expression value stack. For example, given the following grammar production requiring addition:

<add_expression> --> <expr> '+' <expr> {pop;pop;add;push}

The C code might look something like this:

   Boolean
add_expression(){
   ... declarations here ...
   /* Expect an expr followed by a '+' followed by an expr */
   if( ! expr() )
      return FALSE;
   if( token.type != T_PLUS ){
      errprint("missing '+'");
      return FALSE;
   }
   optype = token.type;     /* save the type of the operator */
   lex_gettoken();
   if( ! expr() )
      return FALSE;

   /* Call a function to pop the operands and do the operation */
   if( (msg=execute(optype))->type != E_OK ){
      errprint(msg->value.str);     /* issue returned error message */
      return FALSE;                 /* cannot do optype operation */
   }

   return TRUE;
}

/* A function to pop the appropriate number of operands off the stack,
 * test for compatibility of operands with the given operator,
 * do the type conversions (if any), and then do the actual arithmetic.
 * Returns an Item* containing E_OK if the math worked, or an error
 * string if it failed.
 */
   static Item *
execute(OpType optype){
   Item op1;             /* First operand type and value */
   Item op2;             /* Second operand type and value */
   Item *msgp;           /* pointer to return message */
   static Item msg;      /* our own static return message */
   static char errbuf[ERRMSGSIZE]; /* static buf for error messages */

   switch(optype){
   case T_PLUS:          /* Pop off correct number of pushed values */
      op2 = pop();       /* op2 was pushed last  */
      op1 = pop();       /* op1 was pushed first */

      /* See if the popped operands and the given optype are compatible;
       * do the operation if so.  Use op2 as the result for push back.
       */
      if( (msgp=compatible(&op1,&op2,optype))->type == E_OK ){
         op2.value.num += op1.value.num;    /* do numeric operation */
         op2.type = IT_INT;                 /* type of pushed result */
         push(&op2);                        /* push back the result */
      } else {
         freememory(&op2);                  /* failure; free second operand */
      }
      /* Free the first operand. We return msgp to indicate success/failure */
      freememory(&op1);
      break;
   default:
      msgp = &msg;                          /* point to static Item struct */
      sprintf(msgp->value.str = errbuf,
         "Internal error: Unknown operation type %d in execute()", optype);
      msgp->type = E_ERROR;    
   }
   return msgp;       /* msgp->type indicates success or failure */
}   

Notes:


Ian D. Allen CST 8152 Home Page