CST8110 Midterm Exam #2

Algonquin College - CST8152 - Compilers

Midterm #2 - March 24, 1998

Test Duration - 50 Minutes

Professor - Ian D. Allen

Identification

Instructions

Read the whole test before you start. This test has 8 pages; one is empty.

Find the easy questions. Do the things you know best first. You have 50 Minutes.

Answer questions in the space provided, or on the back of the indicated page.

mayNo calculators or other aids are permitted. This test is closed book.

Raise your hand if you have questions. Do not leave your seat.

You may not be able to complete the entire test.
Remain calm. Answer as much as you can.

Marking Scheme

This Midterm #2 is marked out of 60.

works

Q-1 15 marks

Q-2 5 marks

Q-3 7 marks

Q-4 5 marks

Q-5 10 marks

Q-6 5 marks

Q-7 13 marks

Marks for each question are listed beside each question.

Midterms and class quizzes count together for 30% of your final grade.

The final exam is worth 30% of your final grade.

15 marks

On the back of the previous page, write a set of C language recursive descent parsing functions that recognize the fictitious grammar given below:

program à stmt * EOF

stmt à ( for | asst | ø ) ';'

for à FOR ‘(’ expr ‘;’ expr ? ‘;’ expr ‘)’ stmt

asst à ID ( ‘=’ expr ) +

expr à ID | CONST | ‘(’ expr ‘)’

Notes: * means zero or more repetitions of the preceding item

+ means one or more repetitions of the preceding item

? means zero or one repetition of the preceding item

ø stands for epsilon, the empty string

No semantic actions are required; no Boolean nested error recovery is needed. The functions need only parse and recognize the input, and exit upon encountering the first error. (Similar to Assignment 4.)

A failure to parse the input must result in a call to error("…") that will cause the compiler to exit immediately with an appropriate error message. Give the appropriate error message as the string argument to each use of the error("…") function in your code. Write good messages; keep it simple.

Assume the lexical scanner is named scanner(), the type of the current look-ahead token is in a global variable named tt (token type), and that the error exit function is named error("…"). (Use these two functions in your parser; do not write them yourself.) Write only the functions that parse the above grammar. Invent reasonable names for the various terminal symbols in the grammar.

If you simplify the code using a match() function (recommended!), remember to write it down!

Your C code will be called like this:

main(){
scanner(); /* sets the global tt to be the token type */

program(); /* call the root parsing function */

}

match(TokenType t){

    if( t != tt ) error("expecting %s found %s", ttype(t), ttype(tt));

    scanner();

}

program(){

    while( tt != EOF ){
        stmt();
    match(EOF); /* optional */

}

stmt(){

    switch(tt){

    case FOR: for_stmt(); break;

    case ID: asst(); break;

    default: /*FALLTHROUGH*/

    }

    match(SEMI);

}

for_stmt(){

    match(IF);

    match(LPAREN);

    expr();

    match(SEMI);

    if( tt != SEMI ) expr();

    match(SEMI);

    expr();

    match(RPAREN);

    stmt();

}

asst(){

    match(ID);

    do{

        match(EQU);

        expr();

    while( tt == EQU );

}

expr(){

    switch(tt){

case ID:

    case CONST: match(tt); break;

    case LPAR: match(LPAR); expr(); match(RPAR); break;

    default: error("expecting ID CONST or LPAR found %s", ttype(tt));

    }

}

5 marks

Below and on the back of the previous page, write a pair of C language recursive-descent parsing functions for the two non-terminal symbols in the following grammar, or explain why it is not possible to do so. Do not alter the grammar.

<stmt> à ID | <stmt> ‘+’ <factor>

<factor> à ID ( ‘*’ ID ) * | CONST | ‘(‘ <stmt> ‘)’

Grammar is left recursive; not possible to build a recursive-descent parser

7 marks

Show that the following grammar is ambiguous using the given sentence as an example.
Sentence: a + b * c Grammar:

T à P | id | T ‘*’ T

P à T ‘+’ T

T   => P

    => T + T

    => ID + T

    => ID + T * T

    => ID + ID * T

    => ID + ID * ID

T   => T * T

    => P * T

    => T + T * T

    => ID + T * T

    => ID + ID * T

    => ID + ID * ID

Two different leftmost derivations – therfore an ambiguous grammar.

Can also draw two different parse trees - therefore an ambiguous grammar.

5 marks

Eliminate any immediate left recursion in the following grammar using the transformation algorithm discussed in class:

S à I | S ‘+’ I
I à id | I ‘*’ id
S -> I A

A -> + I A | eps

I -> id B

B -> * id B | eps

10 marks

Produce leftmost and rightmost derivations for the following sentence using the following grammar. The derivation takes 11 steps. Sentence: ( id + id ) * id Grammar:

E à E ‘+’ T | T
T à T ‘*’ F | F
F à ‘(’ E ‘)’ | id

Leftmost Derivation of sentence	Rightmost Derivation of sentence
E => T	E => T
*=> T F**	*=> T F**
*=> F F**	*=> T id**
*=> ( E ) F**	*=> F id**
*=> ( E + T ) F**	*=> ( E ) id**
*=> ( T + T ) F**	*=> ( E + T ) id**
*=> ( F + T ) F**	*=> ( E + F ) id**
*=> ( id + T ) F**	*=> ( E + id ) id**
*=> ( id + F ) F**	*=> ( T + id ) id**
*=> ( id + id ) F**	*=> ( F + id ) id**
*=> ( id + id ) id**	*=> ( id + id ) id**

5 marks

Draw a DFA transition diagram that recognizes either an identifier or an unsigned integer constant and skips over only leading whitespace. (Identifiers have the same format as in Lab assignments.)

Start state loops back to start on WS

Start state goes to INID on letter or underscore; to INUINT on digit

INID goes to INID on letter, underscore, or digit; to GOTID on other

INUINT goes to INUINT on digit; to GOTUINT on other

GOTID and GOTUINT are accepting states with asterisks (ungetc: put char back)

13 marks

Print T in the box if the statement is True; print F if it is False. Correct answers are worth ½ mark each. No answer is worth zero marks. Warning! Incorrect answers are worth minus ½ mark (-0.5) each.

"Skipping to next semicolon" means the parser is using phrase-level error recovery	F
A grammar containing 7 tokens needs 7 recursive-descent functions to parse it	F
A grammar may have more than one parse tree for a sentence	T
An attribute of a parsing node is any value associated with that node	T
Epsilon productions are used when the input token matches the null string	F
Every context-free grammar can be rewritten as a regular expression	F
Every parse tree has a unique leftmost derivation	T
Hand-built parsers are usually top-down parsers	T
If a rightmost and leftmost derivation differ, the grammar is ambiguous	F
Inherited attributes are synthesized from child nodes in the parse tree	F
It is possible to build a parse tree for a left recursive grammar	T
Leaf nodes in a parse tree correspond to the empty string or to tokens	T
Most compile-time errors are simple errors	T
Most error recovery strategies are handled by the scanner	F
Most modern programming languages do not need any look-ahead token to be parsed	F
Nested begin/end blocks are usually handled by context-free grammars	T
Operators with the lowest precedence appear lowest in the parse tree; that is, nearest the leaves of the parse tree	F
Panic-mode error recovery tries to make changes in the input stream to correct an error and continue parsing	F
Parse trees for left-associative operators grow down to the left	T
Predictive parsing requires no backtracking	T
Recursive descent parsers cannot parse right-recursive grammars	F
Regular expressions are usually used to describe lexical constructs	T
Syntactic error recovery may generate spurious semantic errors	T
Synthesized attributes are inherited from parent nodes in the parse tree	F
The grammar of a language makes sure variables are declared before they are used; if not, it is a syntax error	F
The parser must pre-load the first lookahead token before calling the first function in the set of recursive-descent functions that parse the grammar	T