Thursday, May 2, 2013

Scala: Specializing Maps

Scala has put a lot of work into implementing a nice type system, and a usable language on top of that type system. However, sometimes I find myself trying to do things that were easy in Java, and failing. For instance, creating a subclass of a map where the type arguments have been fixed, while retaining the nice factory methods etc. that Scala provides for the built-in Map. For the purpose of this post, let's say I want to implement a specialized IntToStrings map.

A few initial searches and experimentation led me to conclude that you don't want to do something like this:

class IntToStrings extends Map[Int, String] ...

This turns out to be quite a bit of work. And you don't get to use any of the Map factory methods. Instead, you're better off using the MapProxy as a starting point:

class IntToStrings extends MapProxy[Int, String] ...

In this scheme, we construct objects of type Map[Int, String], and wrap them in an IntToStrings. We have a choice to do this wrapping early or late. Late wrapping implies we wait until we need to perform an operation specific to IntToString. Early wrapping by contrast implies we do this operation well in advance, so that perhaps we can pass an appropriately typed object around our program. I tend to think early wrapping would be preferable, as we'd be taking better advantage of type information.

The transcript below executed on the scala REPL should demonstrate this idea more completely. We start with a raw map, and define the class IntToStrings. We then explore the conditions where operations defined on the proxy can be applied to the simple map.

scala> val m: Map[Int, String] = Map(1 -> "1", 2 -> "2")
m: Map[Int,String] = Map(1 -> 1, 2 -> 2)

scala> :paste
// Entering paste mode (ctrl-D to finish)

class IntToStrings(val self: Map[Int, String]) extends scala.collection.MapProxy[Int, String] {
  def getSum = self.foldLeft((0, "")) {
    (sum, entry) => (sum._1 + entry._1, sum._2 + entry._2)
  }
}

object IntToStrings {
  implicit def rawToIntToStrings(map: Map[Int, String]) = new IntToStrings(map)
  implicit def intStringMapToRaw(map: IntToStrings) = map.self
}

// Exiting paste mode, now interpreting.

defined class IntToStrings
defined module IntToStrings

Key to making IntToStrings usable is to ensure we have simple mechanisms for converting Map[Int,String] to it. The scala compiler needs to know that a conversion is applicable and desired. It isn't possible to have the compiler guess that we want to convert the map to an IntToStrings just by attempting the execution of an operation that the type provides. The compiler doesn't have sufficient information.

scala> m.getSum
<console>:9: error: value getSum is not a member of Map[Int,String]
              m.getSum

There are two ways of telling the compiler you want a conversion. The first is to make clear that we want an IntToStrings as the result of some computation, as below. In this case the compiler knows the type we want, looks if the companion object provides any implicit conversion methods, and applies it.

scala> val rm: IntToStrings = m
rm: IntToStrings = Map(1 -> 1, 2 -> 2)

scala> rm.getSum
res17: (Int, java.lang.String) = (3,12)

The second approach is to import the converters, bring them in scope, after which the compiler is able to do the implicit conversion to execute getSum.

scala> import IntToStrings._
import IntToStrings._

scala> m.getSum
res19: (Int, java.lang.String) = (3,12)

Rules for implicit conversion are well documented at many places. They're worth repeating here, as they are so critical to having proxy classes do what we would like: recognize that we have a map on which we wish to execute specialized operations. Which of these is preferable depends a great deal on how you wish to structure the program: do you want to pass around a data structure with the specific name you've given to the proxy class, or would you rather just operate on the map and execute operations as needed?

There is one other consideration here. It probably isn't a good idea to have the proxy class store additional state. It can be easily lost as you convert back and forth to the underlying map. In any case, adding state to the proxy is a bit of an abuse of the class, as we don't really have a pure proxy any more. You'll probably want to come up with a different way to deal with the object in that case.