Wakatta!

Like Eureka!, only cooler

Seven Languages in Seven Weeks Scala Day 2

Second day is dedicated to containers, mostly. Lists, Sets and Maps, some of their most useful methods, and the use of code blocks (anonymous functions)

The various containers are nothing new or special for Java programmers, but the anonymous functions are an effective way to greatly increase the power of existing iteration methods by having them accept arbitrary logic to process each element. This is nothing new or special for functional programmers, of course.

While this is possible in Java (where anonymous classes often play this role), the result is less fluid than in functional languages, so it comes less naturally.

Exercises

Using foldLeft to compute sum of string sizes.

Not as challenging an exercise as yesterday’s. Still, it shows how light and easy to use Scala’s anonymous functions are.

A first version with the foldLeft method:

foldLeft
1
2
3
4
5
scala> val list = List("one", "two", "three")
list: List[java.lang.String] = List(one, two, three)

scala> val sum = list.foldLeft(0) {(sum, s) => sum + s.size }
sum: Int = 11

A second version with the /: operator. The anonymous function is of course strictly identical.

foldLeft operator
1
2
3
4
5
scala> val list = List("one", "two", "three")
list: List[java.lang.String] = List(one, two, three)

scala> val sum = (0 /: list) {(sum, s) => sum + s.size }
sum: Int = 11

Censor trait

Looking for a Scala documentation of String is somewhat frustrating, because there’s none. But that in turns means that Scala just uses Java’s String.

Java String comes with a method that seems to do just what is needed here: replaceAll.

replaceAll
1
2
scala> "one two three".replaceAll("two", "TWO")
res1: java.lang.String = one TWO three
replaceAll, word boundaries
1
2
scala> "she sells shells".replaceAll("\\bshe\\b", "the lady")
res2: java.lang.String = the lady sells shells
replaceAll, case insensitive
1
2
scala> "She sells shells".replaceAll("(?i)\\bshe\\b", "the lady")
res3: java.lang.String = the lady sells shells

As seen in the last two examples, the first argument is actually a regular expression. For a while, I toyed with the idea of using Scala’s Regex.replaceAllIn, so I could check whether the match was capitalized or all upper case and insert the replacement word with identical case, as Emacs does. But this is a whole lot more work, and generic code can only handle a few cases (all lower case, all upper case and capitalized) satisfactorily.

The first version iterates over the pairs in the censored words map, and for each one replaces each basic forms of the censored word by the same form of its replacement (the forms being capitalized, lower case and upper case).

Censor, version 1 (censor.scala) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
trait Censor {
  val words = Map("Shoot" -> "Pucky", "Darn" -> "Beans")
  def censor(text: String) = {
      var result = text
      words.foreach( p =>
          result = result.replaceAll(mm(c(p._1)), c(p._2)).replaceAll(mm(l(p._1)), l(p._2)).replaceAll(imm(u(p._1)), u(p._2))
      )
      result
  }
  
  /* capitalize */
  def c(str: String) = str(0).toUpper + str.substring(1).toLowerCase
  
  /* lowercase */
  def l(str: String) = str.toLowerCase
  
  /* uppercase */
  def u(str: String) = str.toUpperCase
  
  /* make matcher method */
  def mm(str: String) = "\\b" + str + "\\b"
  /* make case insensitive matcher method */
  def imm(str: String) = "(?i)\\b" + str + "\\b"
}

I like the way Scala allows me to write extremely short code for utility methods (like c, l, …).

One problem with this version is that there’s a mutable variable. Using foldLeft, the mutable variable is no longer needed:

Censor, version 2 (censor_fold.scala) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
trait Censor {
  val words = Map("Shoot" -> "Pucky", "Darn" -> "Beans")
  def censor(text: String) = {
      (text /: words) { (t, p) => t.replaceAll(mm(c(p._1)), c(p._2)).replaceAll(mm(l(p._1)), l(p._2)).replaceAll(imm(u(p._1)), u(p._2)) }
  }
  
  /* capitalize */
  def c(str: String) = str(0).toUpper + str.substring(1).toLowerCase
  
  /* lowercase */
  def l(str: String) = str.toLowerCase
  
  /* uppercase */
  def u(str: String) = str.toUpperCase
  
  /* make matcher method */
  def mm(str: String) = "\\b" + str + "\\b"
  /* make case insensitive matcher method */
  def imm(str: String) = "(?i)\\b" + str + "\\b"
}

With the code above, the world is now safe from the threat of rude language:

Censor test code (censor_test.scala) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class Test(val text: String) extends Censor {
  def getText() = text
  def getCensoredText() = censor(text)
}

val test = new Test("Phil Wenneck: God damn it!\n" +
"Alan Garner: Gosh darn it!\n"+
"Phil Wenneck: Shit!\n"+
"Alan Garner: Shoot!")

println("Original text:")
println(test.getText)

println("Censored text:")
println(test.getCensoredText)

produces:

1
2
3
4
5
6
7
8
9
10
Original text:
Phil Wenneck: God damn it!
Alan Garner: Gosh darn it!
Phil Wenneck: Shit!
Alan Garner: Shoot!
Censored text:
Phil Wenneck: God damn it!
Alan Garner: Gosh beans it!
Phil Wenneck: Shit!
Alan Garner: Pucky!

Loading from file

To load censored words from a file, I first need to define a format. To keep things simple, each pair is on one line, separated by one or more spaces.

The Source object contains a useful fromFile method (unfortunately, not documented directly. You have to dig it from the source file). Then it is possible to foldLeft the lines to populate the replacement map.

The rest of the code is identical.

Censor, loading from a file (censor_load.scala) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
trait Censor {
  var words = Map(): Map[String, String]
  
  def load(file: String) {
      import scala.io.Source
      words = Source.fromFile(file).getLines.foldLeft(words) { (map, line) =>
          if (line.trim.length > 0) {
              val pair = line.split("\\s+")
              map + ((pair(0), pair(1)))
          } else map
      }
  }
  
  def censor(text: String) = {
      (text /: words) { (t, p) => t.replaceAll(mm(c(p._1)), c(p._2)).replaceAll(mm(l(p._1)), l(p._2)).replaceAll(imm(u(p._1)), u(p._2)) }
  }
  
  /* capitalize */
  def c(str: String) = str(0).toUpper + str.substring(1).toLowerCase
  
  /* lowercase */
  def l(str: String) = str.toLowerCase
  
  /* uppercase */
  def u(str: String) = str.toUpperCase
  
  /* make matcher method */
  def mm(str: String) = "\\b" + str + "\\b"
  /* make case insensitive matcher method */
  def imm(str: String) = "(?i)\\b" + str + "\\b"
}

/* testing */
class Test(val text: String) extends Censor {
  load("censor.txt")
  def getText() = text
  def getCensoredText() = censor(text)
}

val test = new Test("Phil Wenneck: God damn it!\n" +
"Alan Garner: Gosh darn it!\n"+
"Phil Wenneck: Shit!\n"+
"Alan Garner: Shoot!")

println("\nOriginal text:")
println(test.getText)

println("\nCensored text:")
println(test.getCensoredText)

println("\nImproved censored text:")
test.load("censor2.txt")
println(test.getCensoredText)

outputs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ scala censor_load.scala 

Original text:
Phil Wenneck: God damn it!
Alan Garner: Gosh darn it!
Phil Wenneck: Shit!
Alan Garner: Shoot!

Censored text:
Phil Wenneck: God damn it!
Alan Garner: Gosh bean it!
Phil Wenneck: Shit!
Alan Garner: Pucky!

Improved censored text:
Phil Wenneck: God d--n it!
Alan Garner: Gosh bean it!
Phil Wenneck: S--t!
Alan Garner: Pucky!
censor.txt (censor.txt) download
1
2
shoot   pucky
darn    bean
censor2.txt (censor2.txt) download
1
2
shit    s--t
damn    d--n

Wrapping up Day 2

Scala’s syntax is clearly much shorter than Java’s, and fairly expressive as well. The code flows, is more concise, and feels natural (assuming that you think functional code feels natural, as I do).

Moreover, looking at the online documentation, I can see that there’s more depth to Scala’s type system than can be covered in such a book. This is another area I look forward to investigating further.

Comments