The Symbol Table

This page last updated: Sunday September 27, 1998 01:07

The symbol table accumulates information about all the identifiers in a source text. Each entry in the symbol table is a structure that contains two things:

  1. a name (the name of the identifier or symbol)
  2. some attributes associated with the name, such as "current value", "current line number", "type", etc.

In an interpreter, the one of the attributes is the current type and value of each identifier in the source. In a compiler, the symbol table would keep track of the memory address of the identifiers instead of the values.

Data structure for a symbol table

Rather than invent another data type for the type and value of each identifier in the table, we re-use the Item data structure (which is really the same as the Token data structure).  This simplifies moving data into and out of the symbol table, since we can usually just copy the contents of the entire Item structure back and forth.  The Utility function item_free() is used by sym_free() to free the Item part of a symbol table entry.

typedef struct Symtab {
        char *name;       /* name of symbol in table */
        Item i;           /* attributes of the symbol */
} Symtab;

For most languages, the number of "enter into symbol table" operations is much smaller than the number of "lookup this identifier" operations. Choose your symbol table data structure accordingly.  A linked list of structures is not a good choice, since looking up items requires much pointer traversal.

Start-up and terminate functions are provided to permit the symbol table to be initialized when the Parser starts and freed when the Parser is done.

Public interface functions

Following are the minimal public interface functions to the symbol table.

Symbol table functions are used in assignment statements, and also whenever the parser recognizes an identifier in an expression and needs to retrieve the current value of that identifier from the symbol table to use in the expression.

void sym_init(void):
Initialize the symbol table. This is called by the Parser when beginning to parse an input stream.  If dynamic memory is being used, this is where the symbol table is first created.
Symtab *sym_lookup(char *string):
Looks up the given string as an identifier in the symbol table. Returns a pointer to the symbol table element found, or NULL if not found.  No error message is issued if an entry is not found; the calling function must do that, if appropriate.
Symtab *sym_insert(char *string):
Takes an identifier name, inserts a copy of the name into the symbol table as a new entry. The type of the new symbol is set to the "new" or "unknown" type.  Returns a pointer to the inserted element. Issues an error message and returns NULL if unable to allocate memory or space for the new symbol.  If the symbol table is dynamic, this function grows the table to accomodate new entries.
Symtab *sym_update(Symtab *symp, Item *itemp):
Takes a copy of the passed Item and enters it as the new value and type of the existing entry pointed to in the table, if the update is compatible with what is already in the table.  Makes sure to free any memory in use by the old symbol value using item_free() before doing the update.  Returns a pointer to the inserted element.  Issues a warning message (or an error message and returns NULL) if the update is incompatible with the type of the item already stored in the table. (You may always assign compatibly to a "new" or "unknown" symbol type created by sym_insert().)
Item *sym_value(Symtab *symp):
Returns a pointer to an Item structure containing the type and value of the symbol table entry pointed to by the passed pointer.  Used by the Parser to get the current attributes of a symbol out of the symbol table.
void sym_free(Symtab *symp):
Frees all the memory associated with the symbol table entry pointed to by the passed pointer.  The name of the symbol is freed, and any dynamic memory used by the contained Item structure is freed using item_free().  This completely erases the symbol table entry.
void sym_term(void):
Undoes everything done by sym_init() and sym_insert().  Clears the symbol table of all entries using sym_free() on each entry.   Frees any memory that was allocated for the symbol names or the values in the table. This must clean out a symbol table, as one would do between parsing different files.  Also frees the table itself.  Before parsing the next file, sym_init() must be called to re-create and re-initialize the table.
void sym_dump(FILE *fd, Symtab *ptr):
Dump to the given stream in debug format all the fields in the symbol table entry pointed to by the given argument. If the NULL pointer is given as the ptr argument, dump the entire symbol table. This function is useful in implementing the Dump command.  Consider using one level of recursion when you implement the loop that dumps the entire symbol table.

Data hiding

Like the lexical analyser and the semantic expression value stack, the symbol table is a private data structure accessed only through function calls; the internal structure of the table must be kept hidden. It might be a table; it might be a list; it might be a binary tree. By hiding the structure, the programmer is free to change it without having to completely rewrite the compiler or interpreter.

Where a symbol table interface function returns a pointer to a symbol table element, treat that pointer the same way that you would treat the FILE* pointer returned by fopen(). That is, you can pass the returned pointer to other symbol table functions, but you must not directly access the fields in the structure from code that is not part of the symbol table. In particular, the Parser must not be looking "inside" the returned pointers to see fields in the symbol table. Write symbol table interface routines, taking the pointer, that return the values you need.

(A similar situation exists with the FILE* structure used by <stdio.h>. Though you can directly index this structure to find the integer unit number associated with an open file, the standard insists that you call the fileno() function instead. This hides the internal structure of the I/O library and makes the code more portable.)

Memory management

The symbol table manages all of its own memory. If it needs to keep a string, it allocates storage and copies the string. Nothing except the symbol table functions must free memory returned by the symbol table functions. In particular, don't directly allocate or free any symbol table memory in the parser. Call symbol table functions to do the necessary work.


Ian D. Allen CST 8152 Home Page