-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathWeek08.hs
1618 lines (1222 loc) · 52.9 KB
/
Week08.hs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
{-# OPTIONS_GHC -fwarn-incomplete-patterns #-}
{-# LANGUAGE InstanceSigs #-}
module Week08 where
import Data.Char (toUpper, isDigit, digitToInt, isSpace)
import Data.Foldable (for_)
import Data.IORef (IORef, newIORef, readIORef,
writeIORef, modifyIORef)
import Control.Exception (finally)
import System.IO (openFile, hPutChar, hGetChar,
hClose, IOMode (..), hIsEOF, Handle)
{- WEEK 8 : REAL I/O and PARSER COMBINATORS -}
{- Part 8.1 : I/O Conceptually
Beginning in Week 06, we've seen how to simulate side effects in
Haskell.
But what if we want to do some *real* side effects?
A great philosopher once wrote:
The philosophers have only interpreted the world, in various
ways. The point, however, is to change it.
-- Karl Marx ( https://en.wikipedia.org/wiki/Theses_on_Feuerbach )
So how do we change the world, but remain with a language that only
allows functions with no side effects?
To see why adding side effects to Haskell is not entirely
straightforward, imagine that we added a function that prints a
character from the input and returns an empty tuple. It might have
the have the type:
putChar :: Char -> ()
But adding this function has a large downside. It would spoil our
ability to reason by replacing equals by equals.
For example, in the code:
f x = (x + 100, x + 100)
we should be able to think "there is duplicated code here, let's
make a definition that removes this duplication". Like this:
f x = (z, z) where z = x + 100
However, if we have the following code
f x = (putChar x, putChar x)
Then we might expect that this outputs the character passed in via
the variable 'x' twice. However, if we do the same transformation
to reduce duplicate code:
f x = (z, z) where z = putChar x
it is not clear whether calling 'f' should output two characters or
one. Having functions that perform side effects reduces our ability
to rearrange programs by replacing equals by equals.
To resolve this tension, Haskell treats side effects (such as
printing) not as things that happen as a "side effect" of calling a
function, but as a *value* that we construct. Recall that in Week
06 we saw the use of simulations of mutable state and printing,
representing the idea of side effects with normal data. We can
think of the value as a description of the actions we want to
perform. The type of these descriptions is 'IO a', where 'a' is the
type of values that executing the action will return. Therefore,
'putChar' has the type:
putChar :: Char -> IO ()
That is, 'putChar' doesn't actually do the printing when you call
it. Instead, it returns an "IO action". So the following code:
f x = (putChar x, putChar x)
doesn't output two characters, instead it returns a pair of two IO
actions, that, when they are executed, output whatever is in
'x'. Because 'putChar x' is an IO action, not an instruction to
output, the rewritten code:
f x = (z, z) where z = putChar x
Has exactly the same meaning, and we have retained the ability to
always replace equals by equals in code.
Conceptually, we can think of such "actions" as being in a datatype
similar to the 'Process' type from the Week 06 tutorial questions: -}
data IOAction a
= End a
| Input (Char -> IOAction a)
| Output Char (IOAction a)
{- If actions are values of this datatype, we can pass them around
between functions, place them inside larger data structures, and
combine them in various ways, all while retaining the ability to
reason about our programs using replacement of equals by
equals. Only when an action is passed to the Haskell runtime system
does it actually get executed.
The main interface we use for combining values of type 'IOAction a'
(and 'IO a') is the Monad interface we saw last week. In each of
the instances of 'Monad' we've seen so far, we've isolated the idea
of "do some side effects and possibly return a value" into
different types according to the different kinds of side
effects. The '>>=' function is exactly the 'sequ' function in the
week 06 problems, which sequences the actions of the first argument
before the second one.
The real IO monad is not actually implemented like this. The main
difference between 'IOAction' and the real 'IO' monad is that we
cannot pattern match on values of type 'IO a'. This allows the
Haskell system to more efficiently optimise and execute IO actions.
In summary, Haskell remains a "pure" language, and allows side
effects by:
1. Having a special type 'IO a' of I/O actions that peform some
"real world" actions and return values of type 'a'.
2. Using the 'Monad' interface to combine individual 'IO a'
actions into sequences.
3. Actually executing an 'IO a' action happens either by typing
its name at the prompt in GHCi, or in a standalone program by
being whatever action is defined to be the 'main' value.
In the next sections, we'll look at some of the basic operations
that the IO monad has, and how they can be put together to write
programs that interact with the outside world. -}
{- Part 8.2 : Doing I/O in Haskell
So 'IO a' is the type of "IO actions returning values of type
'a'". But what are the primitive operations for 'IO'? The 'Maybe'
monad has 'failure' and 'catch', the 'State' monad has 'get' and
'put', and so on. What about 'IO'?
'IO' has a very large number of operations, corresponding to all
the different ways of interacting with the outside world. Covering
all of the them is well outside the scope of this course, so I'll
just cover a few to get the general idea.
Perhaps the simplest is 'putChar', which has the following type:
putChar :: Char -> IO ()
As the name indicates, it puts a single character to the output. We
can try it in GHCi:
> putChar 'x'
x>
GHCi treats values of type 'IO a' specially: instead of printing
out the value of type 'IO a', it executes the actions described by
this value, and then prints out the final value.
Notice that there is no newline between the 'x' and the next prompt
from GHCi.
Since 'IO' is a 'Monad', we can use the "do notation" described in
Week 07 with it, and print two characters, one after the other: -}
putTwoChars :: Char -> Char -> IO ()
putTwoChars c1 c2 =
do putChar c1
putChar c2
{- For example:
> putTwoChars 'h' 's'
hs>
It is important to emphasise what is happening here: "putTwoChars
'h' 's'" builds an I/O action that (when run) outputs the
characters 'h' and 's'. It is not actually executed until we give
that action to GHCi.
Using the generic 'for_' function from Week 07 that iterates some
action for every element of a list, we can write a function that
writes a character a fixed number of times to the output: -}
writeN :: Int -> Char -> IO ()
writeN n c = for_ [1..n] (\_ -> putChar c)
{- For example:
> writeN 20 'a'
aaaaaaaaaaaaaaaaaaaa>
More usefully, we can write a function that iterates through a
'String' (which is a list of characters), running putChar on each
one, and then outputs a newline character: -}
printLine :: String -> IO ()
printLine xs =
do for_ xs (\x -> putChar x)
putChar '\n'
{- For example:
> printLine "Hello world"
Hello world
This function is already in the standard library as 'putStrLn',
which knows more about OS-specific line endings.
There is also a primitive "read a character" operation:
getChar :: IO Char
From the type, 'getChar' is an IO operation that returns a
'Char'. For example:
> getChar
x <-- this is entered by the user
'x' <-- this is printed by GHCi
Note that you'll have to press 'Enter' after typing a character for
the terminal to actually send the character to GHCi, but the
'Enter' is not returned by 'getChar'.
As with 'putChar', we can use the 'Monad' operations of 'IO' to
sequence uses of 'getChar': -}
getTwoChars :: IO (Char, Char)
getTwoChars =
do c1 <- getChar
c2 <- getChar
return (c1, c2)
{- For example:
> getTwoChars
xy <-- entered by user
('x','y') <-- printed by GHCi
(again, you'll need to press 'Enter' after the two characters to
make the terminal actually send the input.)
To read lines of input, we need to keep reading characters until we
read a newline ('\n'). The following function does this: -}
readLine :: IO String
readLine =
do c <- getChar
if c == '\n' then
return []
else
do cs <- readLine
return (c:cs)
{- The following function does the same thing, except that it uses an
accumulator to build up the list of characters from the input, and
reverses it at the end. This function consumes less stack space
than 'readLine'. -}
readLine2 :: IO String
readLine2 = go []
where go accum =
do c <- getChar
if c == '\n' then
return (reverse accum)
else
go (c:accum)
{- Either way, you probably shouldn't implement this function yourself,
because it is already in the standard library as 'getLine'. The
standard library version also knows more about OS-specific line
endings.
Now that we have the ability to read and write whole lines, we can
write simple functions that interact with the user. The classic
"What is your name?" function: -}
program :: IO ()
program =
do printLine "Hello, what is your name?"
name <- readLine
printLine ("Hello " ++ name ++ "!")
{- Example run:
> program
Hello, what is your name?
Bob
Hello Bob!
A function that repeatedly reads input until it gets an empty line,
outputing non-empty lines in CAPITALS: -}
capsLockSimulator :: IO ()
capsLockSimulator =
do line <- readLine
if null line then return ()
else do printLine (map toUpper line)
capsLockSimulator
{- An example interaction:
> capsLockSimulator
hello
HELLO
no, really, i am being calm
NO, REALLY, I AM BEING CALM
>
-}
{- Part 8.3 : File Input and Output
The 'putChar' and 'getChar' functions above allow us to read from
the standard input and write to the standard output. But what if we
want to read and write to specific files?
Just as in many other languages, Haskell provides facilities for
opening files to get file handles, which can be given to read and
write function, and then finally closed. The basic API for this is
as follows:
File paths are represented as 'String's, using a type synonym:
type FilePath = String
When we open files, we have to say what we're going to do with it:
read, write, append to the end, or random access reading and
writing:
data IOMode = ReadMode | WriteMode | AppendMode | ReadWriteMode
After we open a file, we get back a 'Handle', which is another
abstract type:
type Handle
The function to actually open files is 'openFile' which takes a
'FilePath' and an 'IOMode' and returns an 'IO' action that will
yield a 'Handle' (or throw an exception if it can't open the file,
see below):
openFile :: FilePath -> IOMode -> IO Handle
To read and write individual characters from a 'Handle', we use
variants of the 'getChar' and 'putChar' functions we saw above:
hGetChar :: Handle -> IO Char
hPutChar :: Handle -> Char -> IO ()
Finally, when we are done with a file, we 'hClose' the 'Handle' to
return the associated resources to the operating system:
hClose :: Handle -> IO ()
Let's see those functions in action. Here is a function that writes
a 'String' to a file (called 'writeFile' in the standard
library). It opens the specified file in 'ReadMode', uses a 'for_'
loop to write every character from 'string' into the handle, and
then closes the handle. -}
writeToFile :: FilePath -> String -> IO ()
writeToFile path string =
do handle <- openFile path WriteMode
for_ string (\x -> hPutChar handle x)
hClose handle
{- Note that this function is not "exception safe": if 'hPutChar' throws
an exception (e.g., the disk is full), then the file won't be
closed properly. We'll see how to fix this below.
To read an entire file into a 'String', we'll need to keep reading
until the end of the file. To find out if a 'Handle' is at the end
of a file (EOF), we can use the function 'hIsEOF':
hIsEOF :: Handle -> IO Bool
We can now write a function that opens a file, reads characters
until we get to EOF and returns all the characters read as a
'String', closing the file before it returns: -}
readFromFile :: String -> IO String
readFromFile path =
do handle <- openFile path ReadMode
content <- go handle ""
hClose handle
return content
where
go h content =
do isEOF <- hIsEOF h
if isEOF then
return (reverse content)
else
do c <- hGetChar h
go h (c:content)
{- This is a toy implementation that is quite slow because it reads in
the data one character at a time. You should use the Haskell
standard library function 'readFile' to actually read in files.
Above, I mentioned that these functions are not "exception safe" in
the sense that if one of the I/O operations throws an exception,
then any open files will not be close properly, and there will be a
resource leak.
A proper discussion of handling of IO exceptions in Haskell
(distinct from the 'simulated' exceptions we saw in Week 06) is
beyond the scope of this course. However, I'll just how how to fix
this problem.
Haskell provides a function 'finally' that is similar to the
'finally' blocks in Java. It has the type:
finally :: IO a -> IO b -> IO a
The idea is that
finally action cleanup
Is an IO action that performs 'action'. Then, however 'action'
terminates, either normally, or by throwing an exception, 'cleanup'
is performed. The whole action then either terminates with a value
or rethrows the exception.
We can use finally to write a higher order function that opens a
file and then passes the handle to the given 'body', using
'finally' to close the file no matter how 'body' terminates. -}
withInputFile :: FilePath -> (Handle -> IO a) -> IO a
withInputFile path body =
do handle <- openFile path ReadMode
result <- body handle `finally` hClose handle
return result
{- With this function, we can rewrite 'readFromFile' to be exception
safe. Now, if 'hIsEOF' or 'hGetChar' throw an exception, the file
will still be safely closed. -}
readFromFile_v2 :: FilePath -> IO String
readFromFile_v2 path =
withInputFile path (\handle -> go handle "")
where
go h content =
do isEOF <- hIsEOF h
if isEOF then
return (reverse content)
else
do c <- hGetChar h
go h (c:content)
{- Part 8.4 : PARSER COMBINATORS
Once we can read in files, we need to be able to understand
them. Conversion of data from a string of characters to a
structured representation is called "parsing". In some sense,
parsing in some form or another is pretty much all of what
computers are asked to do, and there are many different ways of
doing it. The remainder of this week is about a particular way of
doing parsing using functional programming ideas.
We saw a simple example of parsing in Week05: 'splitOn' from the
Data.List.Split library splits a "flat" string of fields separated
by some character into the separate fields. We can use 'splitOn' to
parse lines from CSV (Comma Separated Values) files:
> splitOn ',' "CS316,Functional Programming,20"
["CS316", "Functional Programming", "20"]
But 'splitOn' is a bit too simplistic. For example, how would we
parse fields that contain commas? CSV files usually put fields that
contain commas in quotes:
"CS311,\"Programming Language, Design and Implementation\",20"
which should be understood as:
["CS311", "Programming Language, Design and Implementation", "20"]
But 'splitOn' will split this in the wrong way:
["CS311", "\"Programming Language", " Design and Implementation\"", "20"]
Fixing this is an example of a parsing problem. -}
{- How should we write parsers in Haskell? One way to go about this is
to have the following two thoughts:
- Parsers take input that is a list of 'Char's, i.e., a 'String'.
- Parsers can either fail to parse, because it recieved invalid
input, or succeed, in which case it should return the structured
representation.
Thinking like this leads to the following definition ('version 1'): -}
type Parser_v1 a = String -> Maybe a
{- That is: a Parser of things of type 'a' is a function that takes
'String's and either returns 'Nothing' if the input is malformed,
or returns 'Just' of something if the input is wellformed.
Let's try this idea with to parse 'Bool'ean values. We want to
recognise the string "True" to return the value 'True', and the
string "False" to return the value 'False'. The following function
does this by using pattern matching: -}
parseBool_v1 :: Parser_v1 Bool
-- String -> Maybe a
parseBool_v1 "True" = Just True
parseBool_v1 "False" = Just False
parseBool_v1 _ = Nothing
{- This seems to work:
> parseBool_v1 "True"
True
> parseBool_v2 "False"
False
But of course, we want to parse more complex input than just
exactly "True" or "False". Could we use our parser to parse two
booleans? We want to turn a string like:
"TrueFalse"
into the Haskell value:
(True, False)
But there seems to be no obvious way of doing this with our parsers
as they are currently written. If we try to parse the string
"TrueFalse" with 'parseBool_v1', it will fail because it is looking
for exactly either the string "True" or the string "False".
Wouldn't it be better to be able to reuse our parser for 'Bool's to
parse a 'Bool', and then let something else parse the rest of the
input?
The reason we can't compose our parsers into larger parsers because
the 'Parser_v1' type describes parsers that are "monolithic" --
they make a decision about the whole input, and offer no way to
pass what they don't understand on to another parser. We need
modular parsers that can be chained together. The way we will do
this is by making each parser responsible for parsing as much of
the start of the input as it can, and then returning any left over
input to be passed on to the next parser. -}
{- We upgrade our 'Parser_v1' to also return a 'String' on success
like so: -}
newtype Parser a = MkParser (String -> Maybe (String, a))
{- A 'Parser' of things of type 'a' is a function that takes 'String's
as input and either returns 'Nothing', or 'Just (leftover, a)',
where 'leftover' is the remainder of the input that wasn't parsed,
and 'a' is the value understood from the beginning of the
input. We'll give some examples below.
We've also wrapped the type definition in a 'newtype' so that we
can define some typeclass instances for it below (similarly to how
we needed to use 'newtype' for the 'State' monad in Week 07).
To run a parser, we need a function that unwraps the 'newtype' and
runs the underlying parser on an input: -}
runParser :: Parser a -> String -> Maybe (String, a)
runParser (MkParser p) input = p input
{- Now we can write a parser that parses 'Bool's by looking at the start
of the input. If it sees either "True" or "False", it returns the
'rest' of the input, and the appropriate value. Otherwise, it
returns 'Nothing'. -}
parseBool_v2 :: Parser Bool
parseBool_v2 =
MkParser (\input ->
case input of
'T':'r':'u':'e':rest -> Just (rest, True)
'F':'a':'l':'s':'e':rest -> Just (rest, False)
_ -> Nothing)
{- Let's try it:
> runParser parseBool_v2 "True"
Just ("", True)
> runParser parseBool_v2 "False"
Just ("", False)
> runParser parseBool_v2 "True101010"
Just ("101010", True)
> runParser parseBool_v2 "Truthy"
Nothing
> runParser parseBool_v2 "FalseLEFTOVERS"
Just ("LEFTOVERS", False)
Notice that, in the case of a successful parse, the remainder of
the input is returned along with the result of the parsing. We can
use this to chain two parsers together, passing the leftover input
from the first into the second.
Here is a function that takes two 'Parser's, one for 'a's and one
for 'b's, and creates a 'Parser' for pairs '(a,b)' using the
following strategy:
1. It takes the input in the variable 'input'.
2. It runs the first parser on 'input'. If it fails (returns
'Nothing') the combined parser returns 'Nothing'.
3. If the first parser succeeds with leftovers 'input2' and
result 'a', then it runs the second parser on 'input2'. If
that fais, the combined parser returns 'Nothing'.
4. If the second parser succeds with leftovers 'input3' and
result 'b' then the combined parser returns leftovers 'input3'
and the final result '(a,b)'.
Here is the Haskell code for this strategy: -}
parsePair :: Parser a -> Parser b -> Parser (a,b)
parsePair p1 p2 =
MkParser (\input -> case runParser p1 input of
Nothing ->
Nothing
Just (input2, a) ->
case runParser p2 input2 of
Nothing ->
Nothing
Just (input3, b) ->
Just (input3, (a,b)))
{- We can use 'parsePair' to parse two 'Bools', one after the other by
calling 'parsePair' with 'parseBool_v2' for both arguments: -}
parsePairOfBools :: Parser (Bool, Bool)
parsePairOfBools = parsePair parseBool_v2 parseBool_v2
{- Some example runs:
> runParser parsePairOfBools "TrueTrue"
Just ("", (True, True))
> runParser parsePairOfBools "FalseTrue"
Just ("", (False, True))
> runParser parsePairOfBools "FalseTrueLEFTOVERS"
Just ("LEFTOVERS", (False, True))
> runParser parsePairOfBools "FalsTru"
Nothing
Let's look at 'parsePair' in more detail.
The cascade of 'case's with 'Nothing' always resulting in 'Nothing'
is similar to way that we modelled exceptions using 'Maybe' in
Lecture 10. Also, the passing and returning of the left over input
from one parser to the next ('input', 'input2', and 'input3') looks
very similar to how the state value was passed through when we
modelled mutable state in Lecture 11.
Both 'Maybe' and 'State' were instances of 'Monad', which allowed
us to use generic tools for all 'Monad's, as we saw in Lecture 12.
Could it be that 'Parser' is a 'Monad' too? Let's try to implement
the 'Monad' interface.
For reasons that will be explained next week, we must first
implement the Applicative type class, where 'return' is called
'pure': -}
instance Applicative Parser where
pure x = MkParser (\input -> Just (input, x))
{- The 'return' for 'Parser's is very similar to the combination of
'return' for 'Maybe', which always succeeded by returning 'Just x',
and 'return' for 'State', which returned the state value (here
'input') without modification. In terms of parsing, 'return x' is a
parser which consumes no input and always succeeds. -}
mf <*> ma = do f <- mf; a <- ma; return (f a)
{- This part is explained next week. -}
instance Monad Parser where
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
p >>= k =
MkParser (\input ->
case runParser p input of
Nothing ->
Nothing
Just (input2, a) ->
runParser (k a) input2)
{- '>>=' ("bind") for 'Parser's is also like a combination of the '>>='s
for 'Maybe' and 'State'. It takes the initial input 'input' and
passes it to the first 'Parser' 'p'. If that fails, then (like
'Maybe') it returns 'Nothing'. If it succeeds with leftovers
'input2' and result 'a', it calls 'k' with 'a' to get the
continuation 'Parser' and runs it with 'input2'. -}
{- Now that we have defined 'Parser' to be an instance of 'Monad', we
can use "do" notation with 'Parser's. Here is how we can parse
pairs of 'Bool's using the 'parseBool' parser we wrote above: -}
parsePairOfBools_v2 :: Parser (Bool, Bool)
parsePairOfBools_v2 =
do b1 <- parseBool_v2
b2 <- parseBool_v2
return (b1, b2)
{- In words: first we parse a boolean, call it 'b1', then we parse a
boolean, call it 'b2'. Finally, we return the pair '(b1,b2)'. -}
{- Part 8.5 : Building Parsers
The definitions of the 'Monad' functions for 'Parser' form the
foundation of how we're going to build complicated parsers by
putting together simple parsers. The bind ('>>=') from the 'Monad'
interface lets us put one parser after another to parse two things
in sequence.
The simplest parser we'll use is one that just parses a single
character from the input. If there is a character in the input, it
consumes it and returns it. Otherwise it fails. Here is the
definition: -}
char :: Parser Char
char =
MkParser (\input -> case input of
c:input1 -> Just (input1, c)
_ -> Nothing)
{- Some examples:
> runParser char "hello"
Just ("ello", 'h')
> runParser char "ello"
Just ("llo", 'e')
> runParser char ""
Nothing
From these examples, we can see that 'char' only fails if the input
is empty. Otherwise, it returns the first character in the input.
It'll also be useful to have a 'Parser' that always fails (similar
to 'failure' we defined for 'Maybe' in Week 06). We'll use this
whenever we read a piece of input that we don't want. -}
failParse :: Parser a
failParse =
MkParser (\input -> Nothing)
{- This parser always fails, no matter what input we give it:
> runParser failParse ""
Nothing
> runParser failParse "hello"
Nothing
We can put the 'Parser's 'char' and 'failParse' together to write a
parser that only succeeds on the character 'expected' we give
it. It first uses 'char' to read a single character 'got'. If 'got'
and 'expected' are the same thing, then it returns '()', signaling
success. If they are not equal, it uses 'failParse' to reject this
input. -}
isChar :: Char -> Parser ()
isChar expected =
do got <- char
if got == expected then return () else failParse
{- Some examples:
> runParser (isChar 'h') "hello"
Just ("ello", ())
> runParser (isChar 'h') "ello"
Nothing
> runParser (isChar 'h') ""
Nothing
If we can write a parser that checks for specific characters, we
can use it with a list of specific characters to check for specific
strings of characters. We already have a function that performs
some action for every element of a list: 'mapM_' from Lecture
12. Using this, we get a parser that takes a specific 'String', and
succeeds only if that string is at the start of the input: -}
isString :: String -> Parser ()
isString = mapM_ isChar
{- For example:
> runParser (isString "hello") "hello"
Just ("", ())
> runParser (isString "hullo") "hello"
Nothing
> runParser (isString "hello") "hello world"
Just (" world", ())
Using 'char', we can write a parser for 'Bool's without having to
write the underlying parser directly. We can read the first
character and then decide whether to try to read a "True" or a
"False": -}
parseBool_v3 :: Parser Bool
parseBool_v3 =
do c <- char
case c of
'T' -> do isString "rue"
return True
'F' -> do isString "alse"
return False
_ -> failParse
{- This has the same behaviour as the 'parseBool_v2' we wrote above:
> runParser parseBool_v3 "True"
Just ("", True)
> runParser parseBool_v3 "False"
Just ("", False)
> runParser parseBool_v3 "True101010"
Just ("101010", True)
> runParser parseBool_v3 "Truthy"
Nothing
> runParser parseBool_v3 "FalseLEFTOVERS"
Just ("LEFTOVERS", False)
This way of writing a parser for 'Bool'eans works, but it is a bit
clumsy. What if we have a lot of words to recognise? What if a lot
of them share the first few letters? Definitions like this would
get quite messy.
If we think back to Week 06 on using 'Maybe' to model exceptions,
we had the function 'catch' for modelling the idea of "trying one
thing, and if that fails, trying another". We could use something
like this for writing 'Parser's: we try one parser (for parsing the
word "True", say), and if that doesn't work try another (e.g., for
parsing the word "False").
We'll call the function that tries one parser and then another
'orElse'. Its definition is very similar to the one for 'catch',
except that we also pass the current 'input' into each parser: -}
orElse :: Parser a -> Parser a -> Parser a
orElse p1 p2 =
MkParser (\input ->
case runParser p1 input of
Nothing -> runParser p2 input
Just (rest,x) -> Just (rest,x))
{- Some examples, showing how it is used in an infix position to parse
either an 'a' or a 'b' at the start of the input:
> runParser (isChar 'a' `orElse` isChar 'b') "abc"
Just ("bc", ())
> runParser (isChar 'a' `orElse` isChar 'b') "bca"
Just ("ca", ())
> runParser (isChar 'a' `orElse` isChar 'b') "cab"
Nothing
We can also see the fail-and-try-the-other-one behaviour directly
if we try 'failParse' or else something:
> runParser (failParse `orElse` isChar 'a') "abc"
Just ("bc", ())
With the ability to try one parser and fall back to another if it
fails, we can write parsers that are built from several
possibilities, and return different results from according to what
happened. For example, here is a parser that first tries to match
the input with "True", if that works then it returns 'True';
otherwise it tries to match the input with "False" and if that
works it returns 'False': -}
parseBool :: Parser Bool
parseBool =
do isString "True"
return True
`orElse`
do isString "False"
return False
{- This has the same behaviour as 'parseBool_v2' and 'parseBool_v3' we
wrote above:
> runParser parseBool "True"
Just ("", True)
> runParser parseBool "False"
Just ("", False)
> runParser parseBool "True101010"
Just ("101010", True)
> runParser parseBool "Truthy"
Nothing
> runParser parseBool "FalseLEFTOVERS"
Just ("LEFTOVERS", False)
An important point about 'orElse' is that it represents *ordered*
choice. If the first parser succeeds, then its result is used, even
if the second one could have succeeded. For example, the following
parser built from 'orElse' will greedily match "Four", and never
let the other parser have a go: -}
orderedChoiceDemo :: Parser String
orderedChoiceDemo =
do isString "for"
return "the keyword 'for'"
`orElse`
do isString "forty"
return "the number 40"
{- We can see this behaviour in some examples:
> runParser orderedChoiceDemo "for"
Just ("","the keyword 'for'")
> runParser orderedChoiceDemo "forty"
Just ("ty","the keyword 'for'")
On the input "forty", the first parser for "for" matches, and then
we get the result "the keyword 'for'" and leftover input "ty". The
second parser never gets to run.
This ordered behaviour is often useful, because it gives us unique
(unambiguous) answers if there are multiple ways of parsing the
same input. However, it does mean that the 'Parser's we write might
not parse the input in the way that we intend. It is always a good
idea to have a collection of test inputs for testing your
'Parser's. -}
{- We now have the following basic 'Parser's and ways of combining
them:
1. The 'char' 'Parser' for parsing single characters.
2. The monad interface 'return' and '>>=' for parsers that do
nothing and sequence two parsers.
3. The 'failParse' parser that always fails.
4. The 'orElse' parser that tries one parser, and if that fails
tries another.
With these basic parsers, we have built 'isString' that recognises
literal strings. And we can now go on to build more useful parsers
on top of these.
With these basic functions, we no longer have to write anything
involving 'MkParser' directly. All the other parsers we will write
will be written in terms of the basic functions.
The first useful reusable parser we'll write is one that parses
lists of items by repeatedly trying to run a given 'Parser' until
it fails. We call this parser 'zeroOrMore' because it attempts to
read zero or more parses of the parser it is given from the input: -}
zeroOrMore :: Parser a -> Parser [a]
zeroOrMore p =
do x <- p
xs <- zeroOrMore p
return (x:xs)
`orElse`
return []
{- Some examples:
> runParser (zeroOrMore (isChar 'a')) "aaaa"
Just ("",[(),(),(),()])
> runParser (zeroOrMore (isChar 'a')) "aaaabb"
Just ("bb",[(),(),(),()])
> runParser (zeroOrMore parseBool) "aaaabb"
Just ("aaaabb",[])
> runParser (zeroOrMore parseBool) "TrueFalseTrue"
Just ("",[True,False,True])
> runParser (zeroOrMore parseBool) "TrueFalseTrueaaaa"
Just ("aaaa",[True,False,True])
Note that 'zeroOrMore p' never fails: if it can't parse the input
with 'p', then it just returns the empty list (see the third
example above).
'zeroOrMore p' is also greedy: it parses as far into the input as
it can, no matter what later parses in a sequence are
expecting. For example, the following parser uses 'zeroOrMore'
twice to parse two lists of 'Bool'eans: -}
twoBoolLists :: Parser ([Bool],[Bool])
twoBoolLists =
do bools1 <- zeroOrMore parseBool
bools2 <- zeroOrMore parseBool
return (bools1, bools2)
{- If we try this on any input, the second list will always be empty,
because the first occurrence of 'zeroOrMore parseBool' will have