Wakatta!

Like Eureka!, only cooler

Seven Languages in Seven Weeks Haskell Day 2

Today introduces the functional aspects of Haskell: higher order functions, partial application of functions and lazy evaluation.

Higher order functions should no longer be surprising: many languages have that, even if Haskell other features make them very easy to use.

Partial application is such a feature. A function can be passed some, but not all its arguments, meaning that it is still a function, not a value. It reduces the number of anonymous functions one needs to write when using higher order functions.

Lazy evaluation is something very unique (among the lazy languages, only Haskell is somewhat mainstream). Clojure has lazy lists, which is cool, but lazy evaluation applies to everything. A piece of data can refer to itself in its definition, as long as the part that is needed can be evaluated before the part it depends on.

For instance, a canonical definition of the Fibonacci sequence is

Fibonacci sequence
1
fibs = 1:1:zipWith (+) fibs (tail fibs)

The fibs list is the list of Fibonacci numbers. It starts with 1, 1, then the list of itself summed with its own tail… but the 3rd number depends on the first and the second, so its ok. By the time we need to compute the 4th, the 3rd is already known, and so on.

It takes but an instant to compute the 100000th number in the sequence:

1
2
*Parse> fibs !! 100000
42026927029951543863190051012939151317739157026322345033047..... -- number truncated to save space

Which brings me to a remark on the book: why on earth is fibNth defined the way it is? That function exists, and is called !!. The code in the book is convoluted, does not need that many parenthesis, and even if you have a problem with !!, there is no need to use both take and drop if you’re going to take the head of the result (take will make a copy of the list for no good reason).

Exercises

In general I tried to avoid standard functions that implement a significant portion of the intended behaviour. So I didn’t use sort in my sort function, or read in parsing, …

Simple sort

A good sort algorithm is always tricky, but insertion sort is simple enough and easy to express with pattern matching. My implementation has the same signature as the standard sort function. It expects is arguments to have the class Ords, which guarantees they can be compared.

(mysort.hs) download
1
2
3
4
5
6
7
8
module MySort where
import Data.Ord

my_sort [] = []
my_sort (x:xs) = my_insert x $ my_sort xs
  where my_insert x [] = [x]
        my_insert x (y:ys) | x > y        = y:my_insert x ys
                           | otherwise    = x:y:ys

Testing it:

1
2
*MySort> my_sort ([1..10] ++ [10, 9.. 1])
[1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10]

Sort using comparison function

Sort using a specific comparison function is not harder. The standard implementation uses Data.Ord.Ordering to replace > by the comparison result GT. My implementation has the same signature as the standard sortBy, but still uses the insertion sort as in my_sort.

(mysortby.hs) download
1
2
3
4
5
6
7
8
9
module MySort where
import Data.Ord

my_sort_by :: (a -> a -> Ordering) -> [a] -> [a]
my_sort_by _ []     = []
my_sort_by f (x:xs) = my_insert_by f x $ my_sort_by f xs
  where my_insert_by _ x [] = [x]
        my_insert_by f x (y:ys) | f x y == GT = y:my_insert_by f x ys
                                | otherwise   = x:y:ys

Testing it (using compare on the absolute value):

1
2
*MySort> my_sort_by (\a b -> compare (abs a) (abs b)) ([10, 9..1] ++ [-10..(-1)])
[1,-1,2,-2,3,-3,4,-4,5,-5,6,-6,7,-7,8,-8,9,-9,10,-10]

Parse string into number

Parsing is not hard; to do it I break the string into a integral part, and the fractional part. Both are then cleaned to remove non digits.

The integral part is parsed left to right (with foldl), each time multiplying the already parsed number by 10 before adding the current number.

The fractional part is parsed right to left (with foldr), dividing the already parsed number by 10 before adding the current number.

Note the use of fromIntegral function. This is used to convert and integral number (Int, Integer, …) into any type of number. This is necessary to be allowed to divide the results and add the fractional part.

The use of fractional arithmetic makes this function less effective than read.

(parse.hs) download
1
2
3
4
5
6
7
8
9
10
11
12
13
module Parse where

import Data.Char
import Data.List

parse str =
    let (dig, frac) = break (== '.') str
    in foldl' pDig 0 (clean dig) + foldr pFrac 0 (clean frac) / 10

clean s   = filter isDigit s
pDig  a b = 10*a + toNum b
pFrac a b = toNum a + b/10
toNum c   = fromIntegral $ ord c - ord '0'

Testing:

1
2
3
4
5
6
7
8
*Parse> parse "$2,345,678.99"
2345678.99
*Parse> parse "2,345"
2345.0
*Parse> parse ".99"
0.99
*Parse> parse ".234"
0.23399999999999999

Lazy sequences

Once again, nothing difficult. Haskell notation pretty much reads as a specification of the problem:

(lazy.hs) download
1
2
3
4
5
module Lazy where

thirds x = [x, x+3..]
fifths x = [x, x+5..]
eighths x y = zipWith (+) (thirds x) (fifths y)

Testing:

1
2
3
4
5
6
*Lazy> take 10 $ thirds 10
[10,13,16,19,22,25,28,31,34,37]
*Lazy> take 10 $ fifths 20
[20,25,30,35,40,45,50,55,60,65]
*Lazy> take 10 $ eighths 10 20
[30,38,46,54,62,70,78,86,94,102]

Partial application

Notice the use of partial application of operators: if you wrap the operator and its argument in parenthesis (they are needed here), you have a function that takes the missing argument. The missing argument can be the left one as see here.

(partial.hs) download
1
2
3
4
5
module Partial where

half = (/ 2)

terminate = (++ "\n")

Testing:

1
2
3
4
*Partial> half 10
5.0
*Partial> terminate "Hello"
"Hello\n"

Challenges

Greatest Common Denominator

I must have missed something, because that was hardly a challenge. I just implemented the Euclidean algorithm:

(gcd.hs) download
1
2
3
4
5
6
module GCD where

-- Euclidean algorithm
my_gcd m n | m < n     = my_gcd n m
           | n == 0    = m
           | otherwise = my_gcd (m-n) n

Testing:

1
2
3
4
5
6
*GCD> my_gcd 1961 901
53
*GCD> my_gcd 901 1961 
53
*GCD> gcd 1961 901
53

my_gcd agrees with the standard gcd function.

Lazy prime number sequences

This one was a bit trickier, yet an implementation that closely follows the Sieve of Eratosthenes algorithm is fairly short.

I first need a difference function that works on infinite lists: I manage this by taking into account the fact that the lists are always sorted. The minus just compares the first item of its arguments, so it can work linearly on both of them. Note that this function is not able to work on finite lists, but in this context there is no need to.

The implementation follows the proposed optimizations: it puts 2 in the prime number list right from the start, and skips other even numbers. It also start filtering at p*p, as smaller multiples have been filtered already (being a multiple of smaller prime numbers).

(sieve.hs) download
1
2
3
4
5
6
7
8
9
10
module Sieve where

minus xl@(x:xs) yl@(y:ys)
    | x == y    = minus xs ys
    | x < y     = x : minus xs yl
    | otherwise = minus xl ys

primes = 2:sieve [3,5 ..]

sieve (p:xs) = p:sieve (xs `minus` [2*p, 3*p ..])

The implementation is very slow, but can compute the first 1000 prime numbers.

1
2
*Sieve> primes !! 1000
7927

This turns out to be the first implementation on the Prime Number generator page on the Haskell wiki. Other implementations are much smarter and faster.

Breaking string into lines

The exercise description seems to be missing something: a line length. So I have added that to the functions.

Breaking into words is best done with words, but I implemented my version. I actually started with a first abstraction, not really necessary here, that splits a sequence based on a predicate (items that return true for the predicate are all removed). Then my_words is just calling that function with isSpace as the predicate.

To combine words back into lines, I used two small functions: one (accumUntil) builds a line one word at a time, and stops when the line is too long. It starts with a word as the first tentative line, to make sure that a line is not empty even if a word is too long to fit.

The other function (loop) uses the previous one to build a list of lines until the list of words is empty.

(split.hs) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
module Split where

import Data.Char

-- generalize words and lines: remove extra break characters at the start,
-- then split at the next break characters; recurse on the rest

breakAtEvery f xs = case dropWhile f xs of
                      []  -> []
                      xs' -> p:breakAtEvery f r
                        where (p, r) = break f xs'

my_words = breakAtEvery isSpace

-- loop iterate over lines built by accumUntil
-- accumUntil builds a line (at least one word) and adds words
-- until the length is too long

split m str = loop $ my_words str
  where loop [] = []
        loop (w:words) = case accumUntil m w words of
                           (line, []) -> [line]
                           (line, rest) -> line:loop rest
        accumUntil _ line []        = (line, [])
        accumUntil m line (w:words) = let line' = line ++ " " ++ w
                                      in if (length line' < m)
                                         then accumUntil m line' words
                                         else (line, w:words)

Testing (splitting a long paragraph into lines of at most 72 characters):

1
2
3
4
5
6
7
8
9
*Split> putStrLn $ unlines $ split 72 "Lorem ipsum dolor sit amet, consectetur adipiscing elit. In vel diam nunc. Proin in justo quis nisl aliquam mollis. Sed vitae dolor odio. Pellentesque cursus augue quis libero malesuada ornare. Phasellus interdum ultrices tincidunt. Etiam ullamcorper, massa a ullamcorper posuere, dolor quam consequat est, ut ullamcorper elit eros suscipit ante. Etiam non quam sit amet risus posuere mollis non in est. Nulla facilisi. Sed ut commodo risus. Suspendisse ut nisi ac erat hendrerit interdum at a purus."
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In vel diam
nunc. Proin in justo quis nisl aliquam mollis. Sed vitae dolor odio.
Pellentesque cursus augue quis libero malesuada ornare. Phasellus
interdum ultrices tincidunt. Etiam ullamcorper, massa a ullamcorper
posuere, dolor quam consequat est, ut ullamcorper elit eros suscipit
ante. Etiam non quam sit amet risus posuere mollis non in est. Nulla
facilisi. Sed ut commodo risus. Suspendisse ut nisi ac erat hendrerit
interdum at a purus.

I used unlines to group the split lines back into a single string separated by newlines and putStrLn to print the result.

Justify text

The general structure of the justify functions is the same:

  • compute the maximum length of all the lines
  • for each line, compute the difference between the line length and the maximum line length
  • insert spaces in the right location (depending on the kind of justification)

Each justification is a specific function. First pad is a small utility that creates a string of spaces of the required length.

right and left uses the above strategy to add spaces left and right, respectively. center adds half left, and half right. Of course left does not do anything visible, it just adds spaces to make each line the same length.

both is more complex, as it inserts spaces between words. The strategy is naive (actual algorithms include dynamic programming to balance the amount of space), but effective.

The general idea is to spread the missing space between words. For this I follow these steps:

  • split the line into words using the code from the Split module;
  • compute the number of interval (the count of the words minus 1). As I’m going to put the spaces between words, this interval also count as missing spaces (see next step). I refer to this amount as iter;
  • divide the number of missing spaces (difference between maximum line length and effective line length plus the interval): this is the amount of space I should add between each word to add up to the right amount, if I could add fractional space
  • multiply each item in [1..inter] by the fractional space amount as computed above.
  • iterate over the list from previous step:
    • compute the nearest integer of the current item (note that by construction, the nearest integer of the last item is exactly the amount of missing space);
    • the difference between this integer and the amount of space allocated so far (this amount is zero at the start, of course)
    • add a padding (using the pad function) to a list of spaces, and update the amount of space allocated before the next iteration
  • then zipWith the list of spaces with the list of words, and recreate the line with concat.

The algorithm above is for a single line, but when justifying a whole paragraph, the last line should be left justified. So the justify_both applies the both justification to all but the last line, and left to the last line.

(justify.hs) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
module Justify where

import Split

-- takes a justify function, a max length line (will be adjusted to max of strs lengths)

justify f m strs =
  let m' = foldl max m (map length strs)
  in map (f m') strs

-- when justifying both sides, the last line should be left justified
justify_both m strs =
  let m' = foldl max m (map length strs)
  in map (both m') (init strs ) ++ [left m' (last strs)]

justify_left = justify left
justify_right = justify right
justify_center = justify center

-- helper functions  
-- create n spaces

pad n = take n (repeat ' ')

-- Justify function helpers: add spaces at the right 
-- location to make the length of line equal to m. 
-- Incorrect if m is smaller than length line 

-- pad spaces on the right 
left m line = line ++ pad (m - length line)

-- pad spaces on the left

right m line = pad (m - length line) ++ line

-- pad both sides by half the difference

center m line =
  let lp = floor (fromIntegral (m - length line) / 2)
      rp = m - lp
  in pad lp ++ line ++ pad rp

-- justify both sides: compute the size of each interval in FRACtional value
-- then build a list of interval paddings with the length as close as possible
-- from the running sum of fractional intervals
both m []   = pad m
both m line =
  let (w:words) = my_words line
      inter = length words
      addInterFrac = map (makeInter inter) [1..inter]
      addSpaces = case foldl adjust (0, []) addInterFrac of
                    (_,spaces) -> spaces
  in concat (w:zipWith (++) addSpaces words)
  where adjust (curr, spaces) newFrac = let diff = round (newFrac - fromIntegral curr)
                                        in (curr + diff, (pad diff):spaces)
        makeInter inter i = fromIntegral i
             * (fromIntegral (m - length line + inter))
             / fromIntegral inter

Testing (the full test text is not reproduce here to save space):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
*Justify> putStrLn $ unlines $ justify_left 72 $ split 72 "Lorem ipsum ... a purus."
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In vel diam    
nunc. Proin in justo quis nisl aliquam mollis. Sed vitae dolor odio.    
Pellentesque cursus augue quis libero malesuada ornare. Phasellus       
interdum ultrices tincidunt. Etiam ullamcorper, massa a ullamcorper     
posuere, dolor quam consequat est, ut ullamcorper elit eros suscipit    
ante. Etiam non quam sit amet risus posuere mollis non in est. Nulla    
facilisi. Sed ut commodo risus. Suspendisse ut nisi ac erat hendrerit   
interdum at a purus.
*Justify> putStrLn $ unlines $ justify_right 72 $ split 72 "Lorem ipsum ... a purus."
 Lorem ipsum dolor sit amet, consectetur adipiscing elit. In vel diam
 nunc. Proin in justo quis nisl aliquam mollis. Sed vitae dolor odio.
    Pellentesque cursus augue quis libero malesuada ornare. Phasellus
  interdum ultrices tincidunt. Etiam ullamcorper, massa a ullamcorper
 posuere, dolor quam consequat est, ut ullamcorper elit eros suscipit
 ante. Etiam non quam sit amet risus posuere mollis non in est. Nulla
facilisi. Sed ut commodo risus. Suspendisse ut nisi ac erat hendrerit
                                                 interdum at a purus.
*Justify> putStrLn $ unlines $ justify_center 72 $ split 72 "Lorem ipsum ... a purus."
  Lorem ipsum dolor sit amet, consectetur adipiscing elit. In vel diam                                                                      
  nunc. Proin in justo quis nisl aliquam mollis. Sed vitae dolor odio.                                                                              
   Pellentesque cursus augue quis libero malesuada ornare. Phasellus                                                                     
  interdum ultrices tincidunt. Etiam ullamcorper, massa a ullamcorper                                                                      
  posuere, dolor quam consequat est, ut ullamcorper elit eros suscipit                                                                      
  ante. Etiam non quam sit amet risus posuere mollis non in est. Nulla                                                                      
 facilisi. Sed ut commodo risus. Suspendisse ut nisi ac erat hendrerit                                                                       
                          interdum at a purus.
*Justify> putStrLn $ unlines $ justify_both 72 $ split 72 "Lorem ipsum ... a purus."
Lorem ipsum  dolor sit  amet, consectetur adipiscing  elit. In  vel diamnunc. Proin  in justo quis  nisl aliquam  mollis. Sed vitae  dolor odio.Pellentesque  cursus  augue  quis  libero  malesuada  ornare.  Phasellus
interdum  ultrices tincidunt.  Etiam  ullamcorper,  massa a  ullamcorper
posuere, dolor  quam consequat  est, ut  ullamcorper elit  eros suscipit
ante. Etiam  non quam sit  amet risus posuere  mollis non in  est. Nulla
facilisi. Sed  ut commodo risus. Suspendisse  ut nisi ac  erat hendrerit
interdum at a purus.

Number lines

I finished with this one, as I reused some functions defined in the module Justify above.

This is much simpler than justifying. I need to know the number of digits I would need (which depends on the number of lines). Then I can right justify the line number and add it left of each line.

(number.hs) download
1
2
3
4
5
6
7
8
9
module Number where

import Justify

addLineNum lines =
    let md = floor $ logBase 10 (fromIntegral (length lines)) + 1
    in zipWith (lineNum md) [1..] lines
  where lineNum md num line = toStr md num ++ " " ++ line
        toStr maxDigits num = right maxDigits (show num)

Testing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Prelude> :l number.hs 
[1 of 3] Compiling Split            ( Split.hs, interpreted )
[2 of 3] Compiling Justify          ( Justify.hs, interpreted )
[3 of 3] Compiling Number           ( number.hs, interpreted )
Ok, modules loaded: Number, Justify, Split.
*Number> :m +Split Justify
*Number Split Justify> putStrLn $ unlines $ addLineNum $ justify\_both  72 $ split 72 "Lorem ipsum ... a purus."
1 Lorem ipsum  dolor sit  amet, consectetur adipiscing  elit. In  vel diam
2 nunc. Proin  in justo quis  nisl aliquam  mollis. Sed vitae  dolor odio.
3 Pellentesque  cursus  augue  quis  libero  malesuada  ornare.  Phasellus
4 interdum  ultrices tincidunt.  Etiam  ullamcorper,  massa a  ullamcorper
5 posuere, dolor  quam consequat  est, ut  ullamcorper elit  eros suscipit
6 ante. Etiam  non quam sit  amet risus posuere  mollis non in  est. Nulla
7 facilisi. Sed  ut commodo risus. Suspendisse  ut nisi ac  erat hendrerit
8 interdum at a purus.                                                    

Wrapping up Day 2

As I knew Haskell already, this was not too taxing. I had fun with the justify challenge, trying to come up with a reasonable way to insert the right amount of space at the right place.

Dealing with types was also mostly painless. I had a couple of errors when trying to compile, but every time the location was well reported and the fix easy to figure out.

Comments