CST 8152 - Assignment Three

Topic: A Lexical Scanner.

This page last updated: Sunday September 27, 1998 01:07

Deliverables for this assignment:

Purpose:

This assignment implements and tests the foundation of your lexical analyser: The Scanner. The input to the lexical analyser is a stream of source text. The output of each call to the Scanner is a single Token, to be read, in later assignments, by The Parser. This assignment only counts and prints the tokens; it doesn't try to parse them or build a grammar.

Use the main() program given in the source directory to test your Scanner.  The main() function of this assignment is a modification of that in the previous assignment. Instead of looping reading and printing lines using my_prompt(), we loop reading and printing lexemes using the new scanner() function.  The Scanner will return a TT_EOF token when end-of-file is reached.

The Scanner uses a table-driven deterministic finite automaton (DFA) to recognize the lexemes. To change the lexemes recognized; simply change the tables.  We will be changing the tables in subsequent assignments.

You are given most of the code for the Scanner.c file and all of the code for the Scanner.h file (and all the other header files) in the source directory. Your task is to finish the coding of Scanner.c so that your Scanner recognizes and returns Identifier (TokenType TT_ID) tokens and the end-of-file (TokenType TT_EOF) token.  (The TokenType enum is defined in Scanner.h.)  Identifiers in this assignment follow the rules for C language identifiers.  The TT_EOF token is returned at end-of-file when the scanner is called and no other tokens are available.

Your Scanner may also, at your discretion, recognize and return other TokenTypes, though this is not required (yet).  The given main() driver program will print any TokenType and lexeme returned by your Scanner.  If you don't handle them another way, have your Scanner silently skip over characters that are not part of an identifier.

Instructions:

Part 0 (optional but highly recommended)
Get the MEM package and install it in a simple program.  Learn how it works.   Then install it in your Assignment 2, and use it all subsequent assignments, including this one.
Part 1
Restructure your program source into separate modules of related functions, as outined in the Course Announcements news group.  See my example header files in the source directory, below.  You are not obligated to use these header files; I provide them as samples of how I structured my own version of the assignments.  You will find various macros that I use to simplify my code defined in my header files, e.g. BF_EMPTY().
Part 2
Write a my_close() function according to the specifications below. (You might want to add it to Assignment 2 first, to make sure it works, before you add it to your new assignment.)  The typical use for my_close() is to close files opened by my_open().  The new main() driver given to you needs this function.
Part 3
Copy the given Scanner.c file from the source directory.  Finish the coding of Scanner.c so that your Scanner recognizes and returns Identifier (TokenType TT_ID) tokens and the end-of-file (TokenType TT_EOF) token.  (The TokenType enum is defined in Scanner.h.)  Identifiers in this assignment follow the rules for C language identifiers.  If you do not handle them any other way, silently skip over characters that are not part of an identifier.  The TT_EOF token is returned at end-of-file when the scanner is called and no other tokens are available.

The key places you need to write code are marked in Scanner.c with "//FIXME--" tags in the source file.  Look for them.  If you insert the correct code in these exact locations, you do not need to make any other changes to the file, or to any other files.

Your Scanner may also, at your discretion, recognize and return other TokenTypes, though this is not required.  The given main() driver program will print any TokenType and lexeme returned by your Scanner.  Feel free to enhance your DFA to recognize C comments, integers, floating-point numbers, quoted strings, etc., using the transition diagrams given in the text or in class.  Be on the alert for "put the character back" accepting states!

Note that you cannot push back the EOF flag using ungetc().  Your scanner_ungetc() must take this into account.
Part 4
Test your program on everything you can think of.  Document your testing strategy and submit a summary of what you did and why.

Specifications for my_close()

This function was omitted by mistake from the earlier assignment.  It is the companion to my_open() and it has the following prototype and specifications:

        void
my_close(
        FILE *fd,       /* IN: stream descriptor to close */
        char *fbuf      /* IN: file name associated with fd */
);

Source directory

Copy these source files and modify them according to the assignment instructions.  The header files are examples only; you are not obligated to use them; however, you must divide up your program source into more than one module.


Ian D. Allen CST8152 Home Page

This page last updated: Sunday September 27, 1998 01:07