Semantic Action Symbols in a Grammar

This page last updated: Sunday September 27, 1998 01:07

Attributes, Semantic Rules, Semantic Actions, and Translation Schemes

Now that we have a recursive-descent parser that can recognize a grammar, we need to add attributes and semantic actions to the grammar so that useful work is done as we parse the input.

Attributes of Nodes
Attributes are values associated with nodes in the parse tree.  For example, a "value" attribute of the token T_NUM might be the integer value 384.  A "value" attribute of the token T_STR might be the characters in the string "Hello World!".  A node may have many different kinds of attributes.  Another type of attribute might be the "line number" attribute -- the earliest line number on which a token was seen for that node.

An attribute of a node is called synthesized if it is computed from the children of the node.  It is called inherited if it is derived from the parent of the node.  (Synthesis: bottom to top; inheritance: top to bottom.)  We concern ourselves primarily with synthesized attributes in this course.
Semantic rules and syntax-directed definitions
In a grammar, every node in the interior of a parse tree is represented by a non-terminal symbol in the grammar.  The synthesized attributes of each non-terminal symbol are computed from the children of the node, which means from the symbols in the right-hand side of the production for the non-terminal.  The computations are called semantic rules.

For example, given the grammar production E --> A '+' T_NUM, the value attribute of the non-terminal node E might be computed as the sum of the value attribute for the non-terminal A plus the value attribute for the token T_NUM. We might write this semantic rule as

   Rule: E.val = A.val + T_NUM.val

The semantic rule for the "line number" attribute for node E might look like this:

   Rule: E.lineno = MIN(A.lineno,T_NUM.lineno).

The grammar and the semantic rules together constitute a syntax-directed definition (Aho p.33-34):

    E --> A '+' T_NUM    Rule:   E.val = A.val + T_NUM.val
    A --> T_NUM          Rule:   A.val = T_NUM.val

Semantic rules don't specify the order of evaluation of the child nodes.  They say what must be done, but not what must precede what.  To establish an order, we modify the semantic rules and insert them into the productions themselves, as semantic action symbols.
Semantic action symbols and translation schemes
Semantic actions are program or pseudocode fragments enclosed in braces and embedded in the right sides of grammar productions.  Semantic actions are similar to semantic rules, except that their placement in specific locations in the right sides of grammar productions specifies the order in which the semantic actions are performed, based on where the actions appear in the production.

Given the above syntax-directed definition, we can take each semantic rule apart and insert pieces of it directly into each grammar production inside brace brackets:
  E --> A {tmp = A.val} '+' T_NUM {tmp += T_NUM.val; return tmp}
  A --> T_NUM {return T_NUM.val}

A grammar with embedded semantic action symbols is called a translation scheme (Aho p.37-40: Translation Schemes, Emitting a Translation).

The Parser builds its parse tree, doing derivations step-by-step and reading the input from left-to-right.  As it processes each derivation in the parse tree, it processes the semantic action symbols where they appear in the productions.

In the above translation scheme, the derivation for E must first expand the non-terminal A. With A comes a semantic action.   This action will be performed before continuing with the expansion of E.

The Interpreter Value Stack

An interpreter doesn't generate machine code.  It actually performs the semantic actions as it parses the input.  For arithmetic and string expressions, such as those used in the grammars in this course, a value stack is useful to hold values and their types as the intepreter parses the input. Values (and their types) can be pushed on the stack and popped from the stack.

The value stack stores intermediate results of each semantic action for later use in a grammar production.   The synthesized attribute that is the arithmetic or string value of a node in a parse tree is simply the top value on the value stack at that point in the input parsing.  The translation scheme shows when to push and pop values from the value stack, and how to perform the appropriate arithmetic on those values.

At the lowest level of the grammar, the leaves of the parse tree, tokens get returned by the Scanner and are pushed on the semantic expression value stack. Since the each token has to be scanned before it can be pushed, the semantic action for the push always appears in the production after the operand is scanned.

Translation Schemes

  Example One

Let's look at an example using a simple translation scheme with semantic actions inserted in their proper places.  First, let's start with a simple grammar without any action symbols:

    <stmt> --> PRINT <expr> ';'
    <expr> --> CONST | STR | '-' <expr>

Now, we turn this grammar into a translation scheme by inserting semantic action symbols in the right sides of all the grammar productions in the correct places. We use a semantic action value stack to store intermediate results while performing computations.  Values are pushed and popped from the stack using push() and pop() functions:

    <stmt> --> PRINT <expr> {pop;print} ';'
    <expr> --> CONST {eval;push}
           --> STR {eval;push}
           --> '-' <expr> {pop;negate;push}

Here are what each of the pseudocode semantic action symbols mean:

{pop}: pop the top value off the value stack
{print}: print the popped value
{eval}: get the value attribute of the token (the actual number or string)
{push}: push the value on the value stack
{negate}: negate the value

The action symbols are pseudocode fragments placed in the exact places in the grammar productions where we want the actions to occur.  We will map these locations to the corresponding locations in the code for the recursive-descent parsing functions, when we insert real code to perfors these actions.

Consider the input sentence:

    print ---234;

The left-most derivation (corresponding to a parse tree) that matches this sentence is built as follows, with the semantic action symbols in each production expanding in their proper places as the leftmost non-terminal symbol in each production expands at each step:

<stmt> => PRINT <expr> {pop;print} ';'         // expr will expand

       => PRINT '-' <expr> {pop;negate;push}      // expr again
              {pop;print} ';'

       => PRINT '-' '-' <expr> {pop;negate;push} // expr again
              {pop;negate;push}
              {pop;print} ';'

       => PRINT '-' '-' '-' <expr> {pop;negate;push} // expr again
              {pop;negate;push}
              {pop;negate;push}
              {pop;print} ';'

       => PRINT '-' '-' '-' CONST {eval(234);push}   // all done
              {pop;negate;push}
              {pop;negate;push}
              {pop;negate;push}
              {pop;print} ';'

In this derivation, the <expr> non-terminal expands three times to match the leading minus signs, and a fourth time to match the constant.   The action symbols in the grammar expand along with the productions.  The final sequence of pseudocode actions in the final line of the above derivation is:

    eval(234) push pop negate push pop negate push
        pop negate push pop print

If we eliminate the redundant consecutive push/pop sequences from the actions, the sequence of actions simplifies to this:

eval(234) negate negate negate print

This is exactly the sequence of pseudocode arithmetic actions we want to take upon seeing the input string. The act of building the derivation (building the parse tree) has also built the correct sequence of actions to perform the arithmetic and print the result. The parser of our interpreter will do that sequence of actions in that order as it parses the input tokens for the sentence  print ---234;

The usefulness of the Parse tree is in its ability to organize the semantic actions in the correct order to do the actions we want on the input tokens.

  Example Two

Here is another example of a translation scheme, similar to the one above.  We have added one more production, for the <plus> non-terminal:

    <stmt> --> PRINT <plus> {pop;print} ';'
    <plus> --> <expr> '+' <expr> {pop;pop;add;push}
    <expr> --> CONST {eval;push}
           --> STR {eval;push}
           --> '-' <expr> {pop;negate;push}

Here are what each of the pseudocode semantic action symbols mean:

{pop}: pop the top value off the value stack
{print}: print the popped value
{eval}: get the value attribute of the token (the actual number or string)
{push}: push the value on the value stack
{negate}: negate the value
{add}: add the two values

For the input sentence:

print 234 + 567 ;

We construct a leftmost derivation (equivalent to building a parse tree) to see what actions would be performed when this is parsed:

<stmt> => PRINT <plus> {pop;print} ';'

       => PRINT <expr> '+' <expr> {pop;pop;add;push} {pop;print} ';'

       => PRINT CONST {eval(234);push} '+' <expr> {pop;pop;add;push}
          {pop;print} ';'

       => PRINT CONST {eval(234);push} '+' CONST {eval(567);push}
          {pop;pop;add;push} {pop;print} ';'

Taking the actions in the order in which they appear in the final derivation, we have:

eval(234) push eval(567) push pop pop add push pop print

The result of these semantic stack actions is the value 801 being printed.

Again, the act of parsing the input sentence according to the grammar has resulted in an ordered sequence of pseudocode actions that has correctly performed the arithmetic and caused the result of the arithmetic to be the top stack item that is popped and printed.

The usefulness of the Parse tree is in its ability to organize the semantic actions in the correct order to do the actions we want on the input tokens.

All that remains is to turn the pseudocode into real code and insert it into our recursive-descent parsing functions that recognize the grammar.