A parserโ€™s job is to take a stream of tokens from the scanner and determine if they form a valid program according to the grammarโ€™s rules. An LL(1) parser does this top-down, left-to-right, using 1 token of lookahead. In order to build an LL(1) parser, we need a parsing table that tells us which production rule to use given a nonterminal and the next token .

This process can be formalized into 3 steps.

Preface: Grammar Notation

A CFG has terminals which include actual tokens like , , , nonterminals which are abstract categories like , , , and production rules that can look like .

The symbol means empty. In practice, a production rule like means that can produce nothing.

1. Compute FIRST Sets

is just telling us what terminals can X start with.

  • If is a terminal, FIRST(X) = {X}
  • If is a production, we add to FIRST(X)
  • If , we add minus . If , we also minus , and so on. If ALL of them can be , then we add .

Example:

1: S โ†’ A z
2: A โ†’ B D
3: B โ†’ x
4: B โ†’ ฮต
5: D โ†’ y
6: D โ†’ ฮต

We want to find FIRST(S). We start with S โ†’ A z, so we look at FIRST(A), which is A โ†’ B D. We then look at FIRST(B), which gives B โ†’ x, which gives {x}. We also have B โ†’ e, which gives {e} as well. Following the same process for D, we have {y, e}. Since both B and D can be e, then A can as well. That means that FIRST(A) = {x, y, e}. However, S has z as well in S โ†’ A z, so we conclude that FIRST(S) = {x, y, z}.

2. Compute FOLLOW sets

FOLLOW(X) sets tell us what terminals can appear immediately after X in any derivation.

  • Add eof or $ to FOLLOW(start symbol)
  • If thereโ€™s a production A โ†’ a B b, add FIRST(b) minus e to FOLLOW(B)
  • If thereโ€™s a production A โ†’ a B or A โ†’ a B b where e FIRST(b), we add FOLLOW(A) to FOLLOW(B)

From the above example, we can start with FOLLOW(S) = {eof}, as its the start symbol. We then go to S โ†’ A z. FOLLOW(A) gets FIRST(Z) = {z}, so FOLLOW(A) = {z}. Next is A โ†’ B D, FOLLOW(B) gets FIRST(D), giving us FOLLOW(B) = {y}. Since D can be e, FOLLOW(B) also gets FOLLOW(A) = {z}, so FOLLOW(B) = {y, z}. Lastly, in A โ†’ B D, FOLLOW(D) gets FOLLOW(A) = {z}, so FOLLOW(D) = {z}.

3. Compute FIRST+ sets per production rule

This is what actually goes in the table.

  • If e FIRST(rhs), FIRST+(A โ†’ a) = FIRST(a)
  • If e FIRST(rhs), FIRST+(A โ†’ a) = FIRST(a) FOLLOW(A)

If the right hand side can vanish entirely, then weโ€™d choose this rule when we see anything that could follow A. Following the same example:

  • Rule 1: S โ†’ A z. FIRST(Az) = {x, y, z}. No e. FIRST+ = {x, y , z}.
  • Rule 2: A โ†’ B D. FIRST(BD) = {x, y, e}. Since e is present, FIRST+ = {x, y} FOLLOW(A) = {x, y, z}
  • Rule 3: B โ†’ x, so FIRST+ = {x}
  • Rule 4: B โ†’ e, since e is present, FIRST+ = FOLLOW(B) = {y, z}.
  • Rule 5: D โ†’ y, so FIRST+ = {y}
  • Rule 6: D โ†’ e, since e is present, FIRST+ = FOLLOW(D) = {z}.

Now, all we need to do is just fill in the LL(1) table. For each rule A โ†’ a, we put that rule number in the TABLE[A, t] for every terminal t in FIRST+(A โ†’ a).

xyzEOF
S111ERR
A222ERR
B344ERR
DERR56ERR

Important

A grammar is LL(1) iff no cell in this table has more than one rule. If two rules for the same nonterminal have overlapping FIRST+ sets, we have a conflict and the grammar is LL(1).

The skeleton parser uses this table with a stack, pushing EOF then the start symbol. It loops IF the top-of-stack is a terminal, matching it with current input and pop. IF it is a nonterminal, we look up TABLE[TOS, curr_word], pop TOS, and push the RHS in reverse order. Itโ€™s done when both stack + input are EOF.