Stacks and their Structures

This page last updated: Sunday September 27, 1998 01:07

The parser of an interpreter may use a semantic expression value stack to hold the operands for expressions. The stack must hold both the value of the expression and also its type. You may want to save other information, such as the input line number on which the expression was found, as well.

The stack is a classic, Last In First Out (LIFO) data structure.

Purpose

The purpose of the expression value stack is to turn infix expressions such as "1 + 2" into postfix expressions where all the required operands are saved on the stack by the parser, then the operation is performed by popping the correct number of arguments off the stack and pushing back the single result value. The stack holds the operands of various expressions.

Grammar productions involving binary arithmetic operators (e.g. '+', '-', '/', etc.) pop pairs of arguments off the stack and push back the results of the associated arithmetic operations. Unary operators (e.g. '-') pop off a single operand, apply the operator, then push the argument back again.

The leaves of the parse tree attach to the actual tokens in the language, and these tokens are simply evaluated and pushed on the stack as they are encountered in the parsing process.

Pushing and popping structures

Our interpreter needs a semantic value stack to store attributes of nodes and intermediate results of arithmetic and string expressions.

Since the expressions may involve strings and different types of numbers, different things will get pushed on the stack. We cannot use a simple stack of integers or a simple stack of string pointers. Just as a Token may be one of several different types, an element on the stack may be one of several different types, and we must use a similar type/union pairing inside a structure to handle this. The element pushed on the stack must be a structure that has both a type field and a value union to handle the different values that might be pushed.

In fact, we choose to simply use the existing Token structure definition as the stack element, rather than define a new data type. We make the stack structure, named Item, the same as the Token structure. Thus, we would declare the value stack push() and pop() as follows:
typedef Token Item;     /* use the same definition as Token */

void push(
   struct Item *item    /* IN: pointer to element to push */
);

struct Item pop(void);  /* RET: the popped Item element */

Boolean empty(void);    /* RET: TRUE if stack is empty */
The Item/Token data structure is sufficiently general that we can use the item.type field to know what kind of element is pushed on the stack, and the item.u.* union will hold the actual numeric value pushed, if the Item contains a number. String-value elements can be pointed to by the item.lexemepart of the structure. (Unfortunately, the structure tag "lexeme" isn't particularly appropriate in this stack usage; it makes more sense in the Token structure context.)

Who manages the memory?

We choose to make our stack ignorant of the contents of the structure being pushed. That is, the stack module will not know what is inside the Item structure that is being pushed on the stack. It will never access any of the fields inside the Item structure. It will only push a copy of the Item structure, and pop a copy of the Item structure.

If the Item structure has some string pointers in it, and those strings need to be copied or saved into fresh memory when the Item is put on the stack, that is the job of the module that calls the stack functions. The stack itself will not look inside the Item structure; it will not allocate or deallocate any memory inside the Item structure.

The Parser uses the stack module. The Parser will copy and duplicate any strings needed before it calls push(), and the Parser will free those strings when it calls pop(). The stack module will not do any of this.

Why push a pointer and not the whole structure?

We push using a pointer to the element that we want to push, rather than having the entire contents of the structure copied on each function call, as would be the case if we passed the entire structure instead of a pointer to it. Either way would work; this way is slightly more efficient. The push() function will have to allocate memory for the new stack element and copy the passed data into the new memory, so there is no point in copying the entire structure twice, once into the argument list and a second time into new memory.

We must receive an entire copy of the element structure as the return value from pop(), since the act of popping an element from a stack releases the storage space used by the element. We could not return just a pointer to the popped stack element, since there is no element storage left to point to when pop() is done.

The Parser must do its own memory management of items pushed onto and popped from the stack.

Why return the whole structure?

Returning the entire pushed structure when we call pop() results in more memory copying than returning a pointer to the pushed structure.

The entire pushed structure is returned, instead of a pointer to the structure, because in some implementations of the stack (e.g. a linked list) the storage used by the popped element is free()d after the pop is done, so it cannot be pointed to once the pop() function returns. Returning a pointer would not be possible; we have to return the entire copy of the pushed item.

Returning pointers to static storage (optional topic)

If returning the entire structure isn't desirable, in some circumstances one can return a pointer to some static storage within a function, e.g.
   StackElt *                    -- returns a pointer
alternatePop(){
   static StackElt tmp;          -- static storage

   IF empty() THEN
      print an error message about empty stack and quit
   ENDIF

   tmp = stack[--stkptr];        -- structure copy into static storage
   return &tmp;                  -- return pointer to static storage
}
This does not work well in an interpreter, because the Parser does several calls to pop() in a row, and the second and subsequent calls would overwrite the data in the static tmp buffer. All the returned pointers would point to the same tmp buffer; the data saved by the first calls would be lost unless the Parser saved a copy between calls to alternatePop(). To simplify the Parser, we do the copy in the pop() function by returning the entire structure contents.

Ian D. Allen CST 8152 Home Page