CST 8152 - Jargon
This page last updated: Sunday September 27, 1998 01:07
This page connects many of the jargon words used in building a compiler or interpreter.
- We want to recognize a LANGUAGE, which is made up of SYMBOLS or
characters that constitute a SENTENCE in the language.
- We write a CONTEXT-FREE GRAMMAR for the language.
- The grammar is composed of PRODUCTIONS containing TERMINAL symbols and NON-TERMINAL
symbols.
- Terminal symbols are TOKENS. They are sequences of symbols (characters)
having a collective meanting, e.g. identifier, integer, keyword, string, etc.
- The symbol sequences (character sequences) are called LEXEMES. They are recognized
and classified by a LEXICAL ANALYSER or SCANNER as it reads the
input source language sentence.
- The structure of the lexemes is recognized based on PATTERNS.
- The patterns are often expressed concisely using REGULAR EXPRESSIONS and REGULAR
DEFINITIONS.
- Regular expressions are equivalent to FINITE AUTOMATA.
- Finite automata can be either DETERMINISTIC or NON-DETERMINISTIC.
- Deterministic finite automata can be constructed from non-deterministic finite automata;
however, the construction may require an exponential number of states.
- We can compile a regular expression into a recognizer by constructing
the finite automaton to which it is equivalent.
- Finite automata can be represented using STATE MACHINES that have one START
state and one or more ACCEPTING STATEs.
- State machines, like automata, may be DETERMINISTIC or NON-DETERMINISTIC.
- State machines may be expressed using TRANSITION DIAGRAMS (deterministic) and TRANSITION
GRAPHS (non-deterministic).
- From the transition diagrams and graphs we can construct TRANSITION TABLES that
tell what the next state is, based on the current state and the current input character.
- Transition tables can be coded. If the state machine enters an accepting state
while reading the symbols (characters) in the input sentence, we have recognized a lexeme
in the language. Which lexeme we have recognized depends on which accepting state we
entered.
- The grammar of the language groups the input tokens into a hierarchy called a PARSE
TREE. The tree imposes a structure on the input sentence. A grammar
may be AMBIGUOUS, in which case there exist two different parse trees for
some sentences in the language.
- A node in the parse tree may have various ATTRIBUTES associated with
it, e.g. a numeric value, a line number, a type, etc.
- A node might compute an attribute by INHERITING the attribute from a
parent node, or by SYNTHESIZING the attribute from one or more of the
child nodes in the tree.
- We specify a DEPTH-FIRST TRAVERSAL of a parse tree, which visits all
the nodes in the tree in a depth-first, left-to-right order.
- We can add SEMANTIC RULES to the productions of the grammar, to dictate
how the attributes are computed for each node in the tree. The rules say how
to compute the attributes; but, they don't specify when or in what order.
- Alternately, we can add SEMANTIC ACTIONS to the productions of the
grammar. The SEMANTIC ACTION SYMBOLS are small pieces of code or
pseudocode inserted into the right sides of grammar productions at the exact location
where we want our parser to perform some action.
- The parser executes each semantic action symbol when it encounters it during the parsing
process.
- The actions typically have effects other than simply setting the node attribute; some
actions involve doing arithmetic, some involve generating output.