Program Grammar #3

This page last updated: Sunday September 27, 1998 01:07

Here are the productions for the Program Grammar #3:

        program   := stmt * TT_EOF
        stmt      := ( dump | print | asst ) ? ';'
	dump      := TT_DUMP ( expr ( ',' expr ) * ) ?
        print     := TT_PRINT expr ( ',' expr ) *
        asst      := TT_ID '=' expr
        expr      := term ( ('+'|'-') term ) *
        term      := factor ( ('*'|'/') factor ) *
        factor    := TT_STR | TT_ID | TT_UINT | '(' expr ')' | '-' factor
Notes:  * means zero or more repetitions of the preceding item
        + means one or one repetition of the preceding item
        ? means zero or one repetition of the preceding item
        ø stands for epsilon, the empty string

The above grammar has 8 non-terminal symbols.  These non-terminal symbols will become C Language functions in the Parser.  All the other grammar symbols must be recognized, named, and returned as token types by the Scanner.

Follow the existing TokenType naming convention: Start each TokenType with the prefix "TT_", e.g. TT_PLUS.

Here are descriptions of some of the above tokens:

Epsilon is not a token.  It is a grammar meta-symbol, as are the other symbols described in the Notes, above. It stands for "nothing" (or the empty string) in the grammar. (This is not the same as an empty string in the language.)  You can rewrite this production containing epsilon:

       stmt := ( dump | print | asst | ø ) ';'

as this equivalent production (factoring some of the alternatives):

       stmt := ( dump | print | asst ) ';' | ø ';'

and then eliminate the empty string from the last production, ending up with any of these equivalent productions:

       stmt := ( dump | print | asst ) ';' | ';'
       stmt := ( dump | print | asst ) ? ';'
       stmt := dump ';' | print ';' | asst ';' | ';'

Any one of these generates the same strings.

To choose the correct production to expand for stmt , you will need to know the FIRST sets of the alternative productions (Aho, section 2.4, p.45-46) and have your code select the appropriate non-terminal based on its FIRST set.   The function for stmt must generate an error message if the lookahead token doesn't match any of the three alternatives.

To implement the dump statement, you need to know what token(s) follow the statement so that you can know if the optional argument(s) to the dump command are present.