CST 8152 - The Symbol Table

The symbol table accumulates information about all the identifiers in a source text. For most languages, the number of "enter into symbol table" operations is much smaller than the number of "lookup this identifier" operations. Choose your data structure accordingly. (A linked list is not a good choice, since looking up items requires much pointer traversal.)

Public interface functions

Following are the minimal public interface functions to the symbol table. In several cases, the functions return a pointer to an Item structure containing a type and union value. This is a convenient way to return both an error code (in the type field) and an error message (in the str field of the union) at the same time. The parser can check the error code and print the error message if needed. See the sample code for semantic actions to see how some of these functions are used.

Item *sym_init(void):
Initialize the symbol table. Return an E_OK status if it worked, return E_ERROR and an error message if something went wrong. If dynamic memory is being used, this may be where the symbol table is first created.
SymtabElt *sym_lookup(char *string):
Look up the given string as an identifier in the symbol table. Return a pointer to the symbol table element found, or NULL if not found.
SymtabElt *sym_insert(char *string):
Takes an identifier name, inserts a copy of the name into the symbol table as a new entry. Initializes the type of the symbol in the table to SY_UNKNOWN. Returns a pointer to the inserted element. Returns NULL if unable to allocate memory or space for the new symbol. Optional: An attempt to insert an already existing symbol should fail.
SymtabElt *sym_update(SymtabElt *ptr, Item *itemp):
Given a pointer to a symbol table element ptr (as returned by lookup()), and a pointer to a type/value union itemp, update the given symbol table entry with the new type and value. The symbol table manages its own memory; if the item contains a string, the string will be copied. If the existing type/value in the symbol table is a string, that string is freed before the new string replaces it. Returns the first argument, or NULL if unable to update the symbol table element. Optional: An attempt to update an invalid symbol table element should fail.
Item *sym_item(SymtabElt *ptr):
Returns pointers to the type and value of the symbol table entry pointed to by ptr. This function hides the internal structure of the SymtabElt data type, so the datatype can be changed without changing public code that uses it (such as in the parser). If you handle line numbers in your table, you may want this function to return that information as well.
void sym_finished(void):
Clear the symbol table of all entries, freeing any memory that was allocated. This must clean out a symbol table, as one would do between parsing different files.
void sym_dump(SymtabElt *ptr):
Dump in debug format all the fields in the symbol table entry pointed to by the given argument. If the NULL pointer is given as the argument, dump the entire symbol table. This function is useful in implementing the Dump command.

See the sample code for semantic actions to see how some of these functions are used.

Data hiding

Like the lexical analyser and the semantic expression value stack, the symbol table is a private data structure accessed only through function calls; the internal structure of the table must be kept hidden. It might be a table; it might be a list; it might be a binary tree. By hiding the structure, the programmer is free to change it without having to completely rewrite the compiler or interpreter.

Where a symbol table interface function returns a pointer to a symbol table element, treat that pointer the same way that you would treat the FILE* pointer returned by fopen(). That is, you can pass the returned pointer to other symbol table functions, but you must not directly access the fields in the structure from code that is not part of the symbol table. In particular, the Parser must not be looking "inside" the returned pointers to see fields in the symbol table. Write symbol table interface routines, taking the pointer, that return the values you need.

(A similar situation exists with the FILE* structure used by <stdio.h>. Though you can directly index this structure to find the integer unit number associated with an open file, the standard insists that you call the fileno() function instead. This hides the internal structure of the I/O library and makes the code more portable.)

Memory management

The symbol table manages all its own memory. If it needs to keep a string, it allocates storage and copies the string. Nothing except the symbol table functions must free memory returned by the symbol table functions. In particular, don't directly allocate or free any symbol table memory in the parser. Call symbol table functions to do the necessary work.


Ian D. Allen CST 8152 Home Page