Theory
- What is a parser? A function that converts raw input (text, bytes โฆ) into a structured valueโe.g. an ASTโwhile leaving the unconsumed suffix for later stages.
- Type model.
haskell data Parser a = P (String -> [(a,String)])
List result supports nondeterminism (many possible parses) and an empty list means failure. - Composability problem. Hand-rolled regexes or parser-generator grammars donโt compose; parser combinators treat parsers as first-class values so you can build big parsers out of small ones.
- Monads(-ish).
Parserbehaves likeState + []: withreturn(inject) and>>=(bind) you sequence steps while threading the remaining input and branching over alternatives. Do-notation then hides the plumbing.
Implementation
Core primitives
runParser :: Parser a -> String -> [(a,String)]
runParser (P f) s = f s
oneChar :: Parser Char -- succeeds on any single char
oneChar = P $ \s -> case s of
c:cs -> [(c,cs)]
[] -> []Monad instance
returnP a = P $ \s -> [(a,s)]
bindP pa k = P $ \s ->
concatMap (\(a,s') -> runParser (k a) s') (runParser pa s)
instance Monad Parser where
return = returnP
(>>=) = bindPUseful combinators
failP :: Parser a
failP = P $ const []
(<|>) :: Parser a -> Parser a -> Parser a -- choice
p1 <|> p2 = P $ \s -> case runParser p1 s of
[] -> runParser p2 s
r1s -> r1smanyP repeats a parser zero-or-more times; manyOneP requires at least one.
Higher-level builders
-- Left-associative fold of vP separated by oP
foldLP vP oP = do
v1 <- vP
let step acc = (do o <- oP; v <- vP; step (acc `o` v))
<|> return acc
step v1Used to respect associativity (expr = foldLP int opP). Precedence is handled by grammar factoring:
expr = foldLP prod (plus <|> minus)
prod = foldLP atom (times <|> divide)
atom = parensP expr <|> int