Implementing Reserved Words

This page last updated: Sunday September 27, 1998 01:07

To have keywords such as print and dump work as reserved words, the lexical analyser must recognize them and return the appropriate TT_PRINT and TT_DUMP token types to the parser. This is best handled by a simple string comparison at the point where an identifier is about to be returned from the scanner. If the identifier is one of the strings print or dump, return the token TT_PRINT or TT_DUMP, otherwise return the identifier token type TT_ID as before.

IF current state of DFA has accepted an identifier THEN
   IF lexeme in buffer is "print" THEN
      set the token type to be the PRINT token type
   ELSEIF lexeme in buffer is "dump" THEN
      set the token type to be the DUMP token type
   ELSEIF ... match other keywords ... THEN
      ...
   ELSE
      set the token type to be the IDENTIFIER token type
   ENDIF
ENDIF
...
return the token to the calling function

Case-sensitivity in keywords

Make a conscious decision as to whether your string comparison that recognizes the keywords will be case-sensitive (Unix-style) or case-insensitive (DOS-style). Do you want all of "Print", "prInt", "PRINT", and "printT" to be the same as "print"? Are C Language reserved words case-sensitive?

The test files I supply will use lower-case keywords.

If your reserved words are case-insensitive, what about the identifiers in your language? Are they also case-insensitive?