March 2019 # Applicative vs. Monadic Parsing Some _real world parsers_ as _Parsec_, let us choose between applicative and monadic parsing. The applicative style should be sufficient to parse _context-free_ languages and is easier to reason about, but it is not capable of parsing _context-sensitive_ grammars. ## Functor, Applicative, Alternative, Monad Before we discuss Applicative and Monadic Parsing, it is important to understand what Functors, Applicative Functors, Alternatives and Monads have to offer. When in doubt simply look up the type-class definition. Two well-known data types which are instances of all of the above are **Maybe** and **List**. Try it yourself in _GHCi_. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {.haskell .numberLines} -- Maybe as Functor fmap (+20) (Just 10) fmap (+20) (Nothing) -- List as Functor fmap (+20) [1..10] -- Maybe as Applicative Functor Nothing <*> Just (22) Just (+1) <*> Nothing Just (+15) <*> Just (22) -- List as Applicative Functor []<*>[1..5] [(+1),(+15)]<*>[] [(+1),(+15)]<*>[1..5] -- Maybe as Alternative Nothing <|> Just 5 Just 5 <|> Nothing Just 22 <|> Just 10 -- List as Alternative [1..5] <|> [5..10] -- Maybe as Monad Just 5 >>= Just . (+10) Just 5 >>= return . (+10) Just 10 >>= return . (+5) >>=return . (+10) Nothing >>= return . (+5) >>=return . (+10) Just 10 >>= \x-> return (x*2) >>=return . (+x`div`2) -- List as Monad [1..10] >>= (:[]) . (*5) [1..5] >>= return . (^2) [[1..5],[5..10],[],[12]]>>=id [[1..5],[5..10],[],[12]]>>= map (*4) [1..5] >>= \a -> [100..100+a] >>= \b -> return (a+b) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Simple Parser Our simple toy Parser simply encapsulates a parsing function that returns _Nothing_ if parsing fails. Otherwise a _pair_ consisting of the parsed and unparsed input inside its respective elements. Notice that we made our Parser instance of all the type-classes already. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {.haskell .numberLines} data Parser a b = Parser (a->Maybe (b,a)) instance Functor (Parser a) where fmap f p = Parser g where g inp = case parse p inp of Just (x,y) -> Just (f x,y) _ -> Nothing instance Applicative (Parser a) where pure v = Parser (\inp -> Just (v,inp)) pg <*> px = Parser g where g inp = case parse pg inp of Just (f,b) -> parse (fmap f px) b _ -> Nothing instance Alternative (Parser a) where empty = Parser (\_ -> Nothing) p1 <|> p2 = Parser g where g inp = case parse p1 inp of Nothing -> parse p2 inp x -> x instance Monad (Parser a) where return = pure p >>= f = Parser g where g inp = case parse p inp of Just (a,b) -> parse (f a) b _ -> Nothing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Primitives We also need at least a minimal set of primitives to parse something. Note that the primitives can give you the effects of Applicative or Alternative even if the Parser itself is not an instance of these. (Thanks to dmwit for pointing this out on #haskell). The _num_ parser makes use of the _some_ combinator which requires our Parser to be an instance of _Alternative_ as well. _Applicative_ alone is of limited use in practice since without it, we also lack the <|> combinator. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {.haskell .numberLines} satisfy :: (Char -> Bool) -> Parser String Char satisfy f = Parser g where g (x:xs) | f x = Just (x,xs) g _ = Nothing space :: Parser String Char space = satisfy (==' ') char :: Char -> Parser String Char char c = satisfy (==c) notChar :: Char -> Parser String Char notChar c = satisfy (/=c) num :: Parser String Int num = read <$> some (satisfy isDigit) word :: String -> Parser String String word (c:cs) = (:) <$> char c <*> word cs word _ = pure "" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Applicative / Alternative parsing Now we are able to parse any context free grammar, given our _satisfy_ Parser along with Applicative, Alternative, and recursion. Let's parse something! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {.haskell .numberLines} -- we parse, by simply applying the function encapsulated in our Parser on the input. parse (Parser f) inp = f inp -- well-formed parentheses are the canonical example of a context-free grammar parsePar = concat <$> some parseP where parseP = word "()" <|> (char '(' *> parsePar <* char ')') parse parsePar "((()())()())" -- a simple context-free language with matching pairs of a's and b's, which is not regular parseAB = (\a x b -> a:x++"b") <$> char 'a' <*> parseAB <*> char 'b' <|> word "ab" parse parseAB "aaaaabbbbb" -- parse basic mathematical operations and calulate the results. parseOp op sig = (pure (op) <* many space <*> num <* many space <* char sig <* many space <*> num ) parse (parseOp (+) '+' <|> parseOp (-) '-' <|> parseOp (*) '*' <|> parseOp (div) '/') " 111 * 747 " parse (parseOp (+) '+' <|> parseOp (-) '-' <|> parseOp (*) '*' <|> parseOp (div) '/') "111+747" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Monadic parsing of context-sensitive grammars The canonical non-context-free language can not be caputred by applicative parsing anymore: \$\$\\{a^n b^n c^n : n \\geqslant 1\\}$\$ The following monadic parser will work here: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {.haskell .numberLines} parseABC = length <$> (many $ char 'a') >>= \la -> length <$> (many $ char 'b') >>= \lb -> length <$> (many $ char 'c') >>= \lc -> if la==lb&&lb==lc then pure ("ABCok") else empty parse parseABC "aaaaabbbbbccccc" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Git Repo Here you can find the complete source code of this little toy parser along with the examples presented in this article: ## Summary As usual, use the simplest tool that will suffice :smile: ## Ref / Further Reading * * * *