summaryrefslogtreecommitdiff
path: root/00_blog/00040_Haskell/00150_Applicative-vs-Monadic-Parsing/index.md
blob: 5135527eed72f4212b735cbd0e66dcd659d45f96 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
    March 2019
# Applicative vs. Monadic Parsing

Some _real world parsers_ as _Parsec_, let us choose between applicative and
monadic parsing.  The applicative style should be sufficient to parse 
_context-free_ languages and is easier to reason about, but it is not capable
of parsing _context-sensitive_ grammars.

## Functor, Applicative, Alternative, Monad

Before we discuss Applicative and Monadic Parsing, it is important to 
understand what Functors, Applicative Functors, Alternatives and Monads 
have to offer. When in doubt simply look up the type-class definition.

Two well-known data types which are instances of all of the above are
**Maybe** and **List**. Try it yourself in _GHCi_.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {.haskell .numberLines}
-- Maybe as Functor
fmap (+20) (Just 10)
fmap (+20) (Nothing)

-- List as Functor
fmap (+20) [1..10]

-- Maybe as Applicative Functor
Nothing <*> Just (22)
Just (+1) <*> Nothing
Just (+15) <*> Just (22)

-- List as Applicative Functor
[]<*>[1..5]
[(+1),(+15)]<*>[]
[(+1),(+15)]<*>[1..5]

-- Maybe as Alternative 
Nothing <|> Just 5
Just 5 <|> Nothing
Just 22 <|> Just 10

-- List as Alternative
[1..5] <|> [5..10]

-- Maybe as Monad
Just 5 >>= Just . (+10)
Just 5 >>= return . (+10)
Just 10 >>= return . (+5) >>=return . (+10)
Nothing >>= return . (+5) >>=return . (+10)
Just 10 >>= \x-> return  (x*2) >>=return . (+x`div`2)

-- List as Monad
[1..10] >>= (:[]) . (*5)
[1..5] >>= return . (^2)
[[1..5],[5..10],[],[12]]>>=id
[[1..5],[5..10],[],[12]]>>= map (*4)
[1..5] >>= \a -> [100..100+a] >>= \b -> return (a+b)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Simple Parser

Our simple toy Parser simply encapsulates a parsing function that returns
_Nothing_ if parsing fails. Otherwise a _pair_ consisting of the parsed and 
unparsed input inside its respective elements.

Notice that we made our Parser instance of all the type-classes already.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {.haskell .numberLines}
data Parser a b = Parser (a->Maybe (b,a))

instance Functor (Parser a) where
  fmap f p = Parser g
    where g inp = case parse p inp of Just (x,y) -> Just (f x,y)
                                      _          -> Nothing

instance Applicative (Parser a) where 
  pure v = Parser (\inp -> Just (v,inp))
  pg <*> px = Parser g
    where g inp = case parse pg inp of Just (f,b) -> parse (fmap f px) b
                                       _          -> Nothing

instance Alternative (Parser a) where
  empty = Parser (\_ -> Nothing)
  p1 <|> p2 = Parser g
    where g inp = case parse p1 inp of Nothing -> parse p2 inp
                                       x       -> x

instance Monad (Parser a) where
  return = pure
  p >>= f = Parser g
    where g inp = case parse p inp of Just (a,b) -> parse (f a) b
                                      _          -> Nothing

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Primitives

We also need at least a minimal set of primitives to parse something.
Note that the primitives can give you the effects of Applicative or 
Alternative even if the Parser itself is not an instance of these.
(Thanks to dmwit for pointing this out on #haskell).

The _num_ parser makes use of the _some_ combinator which requires our
Parser to be an instance of _Alternative_ as well. _Applicative_ 
alone is of limited use in practice since without it, we also lack 
the <|> combinator.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {.haskell .numberLines}
satisfy :: (Char -> Bool) -> Parser String Char
satisfy f = Parser g  where g (x:xs) | f x = Just (x,xs)
                            g _            = Nothing

space :: Parser String Char
space = satisfy (==' ')
    
char :: Char -> Parser String Char
char c = satisfy (==c)

notChar :: Char -> Parser String Char
notChar c = satisfy (/=c)

num :: Parser String Int
num = read <$> some (satisfy isDigit) 

word :: String -> Parser String String
word (c:cs) = (:) <$> char c <*> word cs
word _ = pure ""
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Applicative / Alternative parsing

Now we are able to parse any context free grammar, given our 
_satisfy_ Parser along with Applicative, Alternative, and recursion.

Let's parse something! 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {.haskell .numberLines}
-- we parse, by simply applying the function encapsulated in our Parser on the input.
parse (Parser f) inp = f inp

-- well-formed parentheses are the canonical example of a context-free grammar
parsePar = concat <$> some parseP where parseP = word "()" <|> (char '(' *> parsePar <* char ')')
parse parsePar "((()())()())"

-- a simple context-free language with matching pairs of a's and b's, which is not regular
parseAB = (\a x b -> a:x++"b") <$> char 'a' <*> parseAB <*> char 'b' <|> word "ab"
parse parseAB "aaaaabbbbb"

-- parse basic mathematical operations and calulate the results.
parseOp op sig = (pure (op) <* many space <*> num <* many space <* char sig <* many space <*> num )
parse (parseOp (+) '+' <|> parseOp (-) '-' <|> parseOp (*) '*' <|> parseOp (div) '/') " 111  * 747 "
parse (parseOp (+) '+' <|> parseOp (-) '-' <|> parseOp (*) '*' <|> parseOp (div) '/') "111+747"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Monadic parsing of context-sensitive grammars

The canonical non-context-free language can not be caputred by applicative parsing anymore: 

\$\$\\{a^n b^n c^n : n \\geqslant 1\\}$\$

The following monadic parser will work here:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {.haskell .numberLines}
parseABC = length <$> (many $ char 'a') >>= \la -> 
           length <$> (many $ char 'b') >>= \lb ->
           length <$> (many $ char 'c') >>= \lc ->
           if la==lb&&lb==lc then pure ("ABCok") else empty

parse parseABC "aaaaabbbbbccccc"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Git Repo

Here you can find the complete source code of this little toy parser 
along with the examples presented in this article: 

<https://gitweb.softwarefools.com/?p=miguel/haskell.git;a=blob;f=parser/main.hs>

## Summary

As usual, use the simplest tool that will suffice :smile:

## Ref / Further Reading

* <http://dev.stephendiehl.com/fun/002_parsers.html>
* <https://eli.thegreenplace.net/2017/deciphering-haskells-applicative-and-monadic-parsers/>
* <https://byorgey.wordpress.com/2012/01/05/parsing-context-sensitive-languages-with-applicative/>
* <https://arunraghavan.net/2018/02/applicative-functors-for-fun-and-parsing/>