Wednesday, August 28, 2013

On Exploratory Design and Piecemeal Development

I have a soft spot for Lisp-like languages, I suspect I always will. I've now been programming in Scala for several months. And I sometimes find the language tedious. I've lately been thinking about the cathedral and the bazaar, worse is better, and other such essays in relation to how I feel about Scala vs Lisp.

Let's take a look at what Scala offers:

  1. An interesting combination of object oriented and functional programming styles.
  2. A powerful type calculus.
  3. Some interactive development via a REPL, though the REPL is fairly underpowered and almost an afterthought.

Common Lisp has, by contrast:

  1. A dynamic language with optional typing supported unevenly by the various implementations of the language.
  2. An object system that supports multiple inheritance, metaclasses, and multi-methods, with a rich vocabulary to describe how methods should interact.
  3. A language built with the REPL as a central feature. A Common Lisp development environment almost always has a REPL running in the background that can always be accessed.

The most significant downsides I've experienced for Scala are:

  1. Excruciatingly long compile times.
  2. The type system, while expressive, is also rigid.

I've run head-on into these problems with my current task, designing a DSL on top of an existing codebase. The codebase isn't very large, perhaps a couple of hundred substantial classes. Implementing a DSL, for anyone that's attempted this exercise, isn't a straightforward proposition. A DSL is about allowing a user to express their programming problem in a form that's most suitable for their problem domain, while hiding away much of the complexity of the software's underlying implementation. Like any API, implementing a DSL often requires adjusting the rest of the sofware a little bit to support the DSL. And it requires trying out a few different ideas to see which one sticks. It is a process of discovering how one might best help users express the problems in their domain.

Scala requires that the whole program compile before any part of the program is executable. It requires strong consistency in the elements of the program. Which means you can't just fire up a REPL, load up a few classes that are most relevant to the problem at hand, and play around with your implementation. If you're modifying the class definitions in any fashion, you have to make sure that the rest of the program is made consistent with your changes before you can run your experiment. The long compilation times, and inability to focus on just a few classes while experimenting, mean painfully slow experimentation in building up the DSL.

I feel strongly that Common Lisp (and many other Lisp oriented languages, including Clojure) do this better. Let's start with the proposition that your entire program isn't going to undergo development all the time. Development is a highly non-uniform activity. Once a part of the code reaches a certain level of maturity and functionality it is often set aside, while focus shifts to other problems. While under active development, code needn't be wholly consistent. This gives some space to experiment and rewrite the program many times until the right expression is found. On the other hand, touching relatively stable code should require much more care and attention on the part of the developer. The challenge is in structuring the implementation so that we have local stability in parts of the codebase.

Let's delve a little bit deeper into Common Lisp's technical implementation that makes this possible. Say we have a program with a couple of hundred classes, and we wish to evaluate a DSL implementation that changes the type of a commonly used class by removing one of its supertypes. We would start with loading the entire program into the development environment. Then we would interactively change the type of the class. To be clear, this means changing the type definition of the class, and impacting all existing implementations of that class. Common Lisp provides a type change protocol to support such an operation on objects. Parts of the program would no longer function correctly, but that's fine so long as the whole program isn't running. Since Common Lisp is dynamic it typically tolerates such inconsistency. Now we can implement a partial program to test if the implementation we have in mind can work. Rather than going through full compile cycles for the entire program, we can evaluate or compile just the bits we're working with, quickly verify if our approach might work, and either move forward or revert. We're changing the behavior of our program at runtime in order to evaluate alternate implementations.

Programming tool support made programming in Java palatable. Maybe one day we'll see better tool support for Scala, with automated support for refactoring type hierarchies and type parameters. For now we must do this manually.

Sunday, July 7, 2013

To clabango or not to clabango

Pedestal's lack of documentation and confusing client/server development model caused me to switch to luminus, which comes bundled with clabango. It's been a while since I've done template based page generation, so maybe this is par for the course. So far clabango has left me a bit underwhelmed. It might prove to be sufficient for my needs, however, so I'm trying to find ways to work with the package.

The first thing that struck me is its utter un-lisp-ness. That seems to be a positive for some. More troubling to me is how clabango seems to encourage breaking data abstractions. I have a protocol Identifier that I would like to render in a page. I have two options when I do so: pull out the value from the record implementing the protocol ahead of time, or access the record's raw fields through the map API. There doesn't appear to be a reasonable way to render through the protocol methods on the page.

;; Define an identifier protocol and a widget-id record
user=> (defprotocol identifier
         (value [this]))
identifier
user=> (defrecord widget-id [id-value]
         identifier
         (value [id]
           (.id-value id)))
user.widget-id
user=> (require 'clabango.parser)
nil
user=> (alias 'parser 'clabango.parser)
nil
;; This is obviously wrong user=> (parser/render "The ID is {{ id }}." {:id (->widget-id "abc")}) "The ID is user.widget-id@caada592."
;; Can't use the protocol abstraction user=> (parser/render "The ID is {{ id.value }}." {:id (->widget-id "abc")}) "The ID is ."
;; Works if we don't use the abstraction,
;; which defeats the purpose of having one at all
user=> (parser/render "The ID is {{ id.id-value }}." {:id (->widget-id "abc")}) "The ID is abc."
;; Works if we pass in the raw value,
;; which doesn't propagate the abstraction through the whole program.
;; Once again it defeats the purpose of the abstraction.
user=> (parser/render "The ID is {{ id-value }}." {:id-value (value (->widget-id "abc"))}) "The ID is abc."

My initial reaction was that clabango should just allow arbitrary clojure code in its templates. This is a reasonable strategy if the templates are all part of the application. I can't think of why you'd want to have the templates be user defined anyway, and the code generation abilities of Lisp-like languages will make turning templates into compiled code fairly simple. But clabango is what it is, and such a radical departure would yield something other than clabango.

So instead I implemented a simple filter that executes a one argument function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(require 'clabango.filters)
(alias 'filters 'clabango.filters)

(require 'clojure.tools.reader.edn)
(alias 'edn 'clojure.tools.reader.edn)

(defn dequote [quoted-string]
  (if (and (> (count quoted-string) 2)
           (= (first quoted-string) \")
           (= (last quoted-string) \"))
    (.substring quoted-string 1 (- (count quoted-string) 1))
    (throw (IllegalArgumentException. ""))))

(defn find-function [ns-function-str]
  (when (string? ns-function-str)
    (try
      @(resolve (edn/read-string (dequote ns-function-str)))
      (catch Exception e
        nil))))

(filters/deftemplatefilter "apply" [node body arg]
  (when body
    (when-let [function (find-function arg)]
      (function body))))

Which allows me to render an ID thus:

user> (parser/render "The ID is {{ id|apply:\"user/value\" }}."
                     {:id (->widget-id "abc")})
"The ID is abc."

I am deeply disappointed with clabango. My initial hesitation was that moving from clabango to something else would effectively mean tracking changes in luminus on an ongoing basis, which is too high an overhead for a one-person hobby project. But luminus doesn't seem to have any libraries of its own, just some templates that make starting a project straightforward.

Whatever I choose to do, I will write up in a follow-on blog post.

Saturday, July 6, 2013

Clojure: Symbols, Reading, and Execution

I wanted to read a function from a string, and apply an argument to it. This seemingly simple task is surprisingly  tricky in Lisp-like languages. The basic steps are straightforward: read the contents of a string, parse them as a function name, and apply the corresponding function to the desired arguments. Lisp-based languages are homoiconic, and also tend to have support for read-time evaluation. The combination makes safe parsing of data structures both powerful and dangerous.

I tried to rely on my prior Common Lisp knowledge, but that turned out to be information I needed to unlearn. There might still be errors in my understanding. And, while my writeup is accurate as of Clojure 1.5.1, the language is still fast evolving. Take the information here with a grain of salt.

Of Symbols and Namespaces

While reading, one has to deal with symbols, as they are invariably part of the input stream. In Common Lisp (CL) a symbol always belongs to a package, except when it doesn't. Every package symbol is automatically interned. So two symbols with the same name are always identical. The exception is that you can create uninterned symbols that don't belong to any package. And these symbols aren't equal to each other even when they have the same name.

;; By default intern into the current package
? (intern "FOO")
FOO
NIL
;; A quoted symbol is interned
? (eq (intern "FOO") 'foo)
T
;; Two quoted symbols are the same
? (eq 'foo 'foo)
T
;; This is an uninterned symbol
? '#:foo
#:FOO
;; Two uninterned symbols aren't the same
? (eq '#:foo '#:foo)
NIL
;; This is how you create a new uninterned symbol
? (make-symbol "FOO")
#:FOO
;; Two created uninterned symbols still aren't the same
? (eq (make-symbol "FOO") (make-symbol "FOO"))
NIL
;; But two unintnerned symbols are, as expected, identical
? (let ((uninterned '#:foo)) (eq uninterned uninterned))
T

Clojure is different. Instead of packages we have namespaces, and quoted symbols aren't automatically interned into namespaces. You can ask ns-interns for a map of all the interned symbols. Quoting will give the uninterned symbol, even if there's an interned symbol with the same name. There's some more information and examples in this google groups post. The CL transcript above can be approximated in clojure thus:

;; By default intern into the current package
user> (intern 'user 'abc)
#'user/abc
;; Hash-quoted returns an interned symbol, but only if previously interned
user> #'foo
CompilerException java.lang.RuntimeException: Unable to resolve var: foo in this context, compiling:(NO_SOURCE_PATH:1:602) 
user> #'abc
#'user/abc
;; Two hash-quoted symbols are the same
user> (identical? #'abc #'abc)
true
;; This is an uninterned symbol
user> 'abc
abc
;; Two uninterned symbols aren't the same
user> (identical? 'abc 'abc)
false
;; This is how you create a new uninterned symbol
user> (symbol "abc")
abc
;; Two created uninterned symbols still aren't the same
user> (identical? (symbol "abc") (symbol "abc"))
false
;; But two unintnerned symbols are, as expected, identical
user> (let [uninterned (symbol "abc")] (identical? uninterned uninterned))
true

The keyword namespace in clojure is different, and symbols in that namespace have a distinct appearance. This is also the case in CL, and clojure's behavior more closely resembles that of CL.

  1. All keyword symbols begin with a :.
  2. Keywords are automatically interned. One needn't explicitly create an interned keyword.
  3. A corollary is that two keywords with the same printed representation will always be identical.

Keywords have a special role in clojure, in their ability to access content in data structures. These properties are essential in enabling that role.

One of the read-time pitfalls possible in CL is interning a very large number of symbols into a package, which is effectively a memory leak. Symbols are garbage collected only when they're uninterned. And the automatic interning of symbols is a subtle bug that can creep into many programs. This concern isn't realized in clojure for two reasons:

  1. Symbols aren't automatically interned, except keywords.
  2. Interned symbols are based on interned strings, so can be garbage collected when there are no references to them.

The Pitfalls of read-string

Another type of problem occurs during read-time evaluation. The reader can execute code as its reading an s-expression from a stream. Here's a far better writeup on the matter than I could have done.

The Solution?

Add clojure.tools.reader to your project.clj. Then the following works:

user=> (require 'clojure.tools.reader.edn)
nil
user=> (clojure.tools.reader.edn/read-string "clojure.core/list")
clojure.core/list
user=> (resolve (clojure.tools.reader.edn/read-string "clojure.core/list"))
#'clojure.core/list
user=> @(resolve (clojure.tools.reader.edn/read-string "clojure.core/list"))
#< clojure.lang.PersistentList$1@37f6b4df>
user=> (@(resolve (clojure.tools.reader.edn/read-string "clojure.core/list")) 1 2)
(1 2)

Sunday, June 30, 2013

Changing namespace aliases in Clojure

For unit testing my clojure code, I decided to try a little trick which I wasn't sure would work. I have a model namespace that defines the application state, which could be a rather large object that's computationally expensive to create and maintain. Rather than use with-redefs everywhere, I attempted to mock out the entire model namespace:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
(ns myapp.routes.home-test
  (:require [clojure.test :refer :all]
            [myapp.routes.home :as routes]))

(ns myapp.routes.home-test.model-fake)

(def ^{:dynamic true :doc "Count of the number of ID generation calls"}
  *ids-generated* nil)

(defmacro with-test-context [& body]
  `(binding [*ids-generated* 0]
     ~@body))

(defn generate-id []
  (var-set *ids-generated* (+ *ids-generated* 1)))

(ns myapp.routes.home-test)

(alias 'fake 'myapp.routes.home-test.model-fake)

(use-fixtures :once
  (fn [f]
    (ns myapp.routes.home)
    (ns-unalias 'myapp.routes.home 'model)
    (alias 'model 'myapp.routes.home-test.model-fake)
    (try (f)
         (finally
           (ns myapp.routes.home)
           (ns-unalias 'myapp.routes.home 'model)
           (alias 'model 'myapp.model)))))

The idea was pretty straightforward:

  1. The source file name dictates the primary namespace, so the first form declares the namespace of our test cases.
  2. Next, we define a namespace that will serve as the model fake. We define a macro that defines a standardized test context, and then a set of functions that will be used to mock out the real namespace functions.
  3. Now, back in the main namespace, we create a fixture that switches to the namespace being tested, renames the model alias to point to the fake, runs the tests, and then switches the alias back to the correct namespace.

Sounds great in theory, but (fortunately) this doesn't work. I have verified that the namespace is indeed switched. However, clojure seems to compile in the namespace resolved symbol at the time of compilation. So while with-redefs works, switching namespace reference isn't sufficient.

Tuesday, June 11, 2013

Point Reyes, with the California Academy of Sciences

In late April I had visited Point Reyes on a trip organized through the California Academy of Sciences. It was a photography focused trip, covering key aspects of conservation efforts in Marin and the Point Reyes seashore .As such, I got a lot more out of it than I had expected. One of the big bonuses was Calumet helping organize the trip. They brought along cameras and glass. I had a 70-200 F2.8 LII IS lens, with a 2x extender, without which I wouldn't have gotten very many decent photographs on this trip. I have been on a longer trip to the Eastern Sierras through the Cal Academy in the past, this trip was every bit as good. See the whole album here.

Even with a 35mm equivalent focal length of 640mm (200mm with a 2x extender on a sensor with 1.6 crop factor) some of the situations were extremely difficult to photograph. We visited an egret nesting site, and the viewing platform was, photographically speaking, far from ideal. The day was very bright, and the lighting harsh, necessitating quite a bit of post-processing to reduce the contrast. I didn't realize it at the time, but the 2x extender is as bad as all the reviews claim. It added a lot of grain to the photographs. I would probably have been better off working with a 1.4x extender.
It was spring. There were a lot of flowers. I thought we were doing well before we got to Point Reyes.
Point Reyes, well, was blooming. You couldn't point your camera many places without flowers getting in the frame. Which is great. I like photographing flowers.
One of the stops we made was at a farm in the Point Reyes area. We learned a lot about the history of Point Reyes, and the role ranchers played in its development. All the farms around Point Reyes are small family operations. Ranchers initially had no desire to have the NPS as a neighbor, and fought the establishment of Point Reyes. That no longer seems to be the case. The farms are all on the northern end, the southern end is now relatively untouched.
Some more flowers.
There are tule elk at Point Reyes. They were keeping their distance that day. This herd was all female. The males stay away except around mating season. The elk have no natural predators, and introducing wolves or grizzlies wasn't really an option. Population control is through intravenous contraception, administered through a dart gun.
One last flower. This was with my 100mm macro, rather than the 70-200 with extender. Especially when zoomed in, this looks a lot more pleasing. No grain, not quite as much contrast, and really nice and smooth bokeh. I could have spent the entire day there. But there wasn't much time left, and I didn't get any really good shots with the macro.

Sunday, June 9, 2013

Getting Started with Pedestal

I want to write a web app in clojure. I'll begin with picking a framework. I had played a little with Noir the last time I had picked up clojure. And now it has morphed into lib-noir. Other than that, there are also Luminus and Joodo. I picked Pedestal though, which caught my attention for its unique approach to clients. Pedestal seems to be a relatively immature framework, which also means I need to spend a lot more time and effort figuring out how to get going with it.

Implementing a service is easy enough, and well enough documented on the pedestal site.

mkdir ~/tmp
cd ~/tmp
lein new pedestal-service helloworld
cd helloworld

The documentation however doesn't cover how to get going with the client. I had to peek into clojars to get the equivalent:

mkdir ~/tmp
cd ~/tmp
lein new pedestal-app helloworld-app
cd helloworld

That this basic bit isn't documented should give you a sense for how far the documentation still has to go. The pedestal samples aren't very well documented, so it is rather hard to get a sense for how they're working. Here's what I have been able to figure out, in no particular order:
  • Like most web frameworks, the pedestal service receives a request, and produces a response to it, using an appropriate registered handler.
  • Unlike most web frameworks, a thread isn't tied to a request (and vice versa), which makes pedestal potentially much more flexible and efficient.
  • The client specification is written in the form of a dataflow, where an event is translated to an application state change, which in turn triggers a UI change.
  • The entire application state is captured by one (hidden) state object.
  • A client application created as above will give you a simple dev environment. You can easily interact with this dev environment. This is partially documented in the README for the helloworld sample. More documentation is available once you start the dev environment for a new project and go to http://localhost:3000/index.html
  • There doesn't seem to be a simple example on having a pedestal client communicate with a pedestal service. The chat client/server example is fairly complex, and will take a little time to boil down to the minimal messaging you can do between client and server.
  • Client and server applications are implemented and executed entirely independently of one another. For instance the chat sample requires you to link (or copy) the build result of the chat client into the server's resources.
  • I don't see why pedestal separates client and server lein templates. The client template after all includes some JVM-targeted clojure (as opposed to ClojureScript) to run a dev server. Why not have that dev server be a pedestal instance? This might help a great deal in getting started with building a coherent client/server application.
  • I don't know browser javascript APIs that well. And pedestal isn't going to help me with that lack of knowledge. This is entirely a personal problem that I didn't really expect pedestal to help with.
Given all these shortcomings, perhaps I shouldn't bother with pedestal. But I'm doing a hobby project to learn clojure better. And if the learning also brings along a different paradigm for thinking about web application programming, all the better. Pedestal to me is an experiment that I feel I would do well to learn from.

Thursday, June 6, 2013

On the misuse of dynamic extent

I've been away from the Lisp family of languages for a while, and am in the process of getting myself re-acquainted. I've started playing with Clojure again. And while there are lots of cool ideas I've found in the Clojure camp, today I found an outright error. The problem with this article is that it sets up a poor straw-man use of dynamic extent, and then trashes it. I have never seen a (modern) Common Lisp program written in this manner, and I hope this isn't common practice with Clojure.

To re-iterate, here's the bad form:

1
2
3
4
5
(defmacro with-resource [src & body]
  `(binding [*resource* (acquire src)]
     (try ~@body
       (finally
         (dispose *resource*)))))

There are two problems with this code:
  1. It fixes the variable that will be bound.
  2. It assumes that dynamic extent is the right manner for binding this variable.
In Common Lisp, this code would typically be written something like this, and makes neither assumption:

1
2
3
4
(defmacro with-resource ((var src-form) &body body)
  `(let ((,var (acquire ,src-form)))
    (unwind-protect (progn ,@body)
      (dispose ,var))))

The key difference here is in the argument list. Good programming practice dictates that a with- macro in Common Lisp should always take the name of the variable that it's going to bind. Also relevant is that Common Lisp allows binding both lexical and dynamic scope variables using let. So the same macro can be used for both effects. The following two uses of with-resource are both legal:

1
2
3
4
5
6
7
(defvar *dynamic*)

(with-resource (*dynamic* <some-src>)
  <do-something>)

(with-resource (lexical <some-src>)
  <do-something>)

Finally, Common Lisp also allows you to introduce a new lexical binding using a let form:

1
2
3
(let ((*new-dynamic* ...))
  (declare (dynamic-extent *new-dynamic*))
  ...)

If with-resource were a library function, you would almost certainly want to analyze the body argument for declarations, and re-arrange the macro-expansion accordingly. I'll leave that as an exercise for the reader.

But Common Lisp isn't Clojure. I don't know yet if this translates to Clojure well, as Clojure has distinct let and binding forms for lexical and dynamic variables. If Clojure has sufficient introspection to distinguish a lexical from a dynamic variable, a macro could choose the appropriate form, and produce an effect similar to Common Lisp. If not, I hope such introspection is added in due course.

Now, this is no panacea for the other problems the author had pointed out. You would never want to use a stack bound (or thread local) value for communicating state between threads. I would argue however that each language tends to have gotchas that programmers in that language must learn. In C/C++ one has to learn a lot about safe use of pointers. In garbage collected languages programmers must know something about how the particular garbage collector works in order to write performant, and even safe, code. In languages that provide dynamic scoping, an incredibly powerful feature that is open to misuse, one has to understand how that feature is implemented. This is emphatically not a reason to take away a language feature, as the article's author has suggested.

Monday, May 20, 2013

Providing mocks through data providers

Over the weekend I played a bit with TestNG data providers for the first time. This was in a scala program, but doesn't really have anything to do with scala per se. I had something like this in my test class. This is an example, it may not actually run, but it's representative of what I'd written. Key here is that the mock objects were being put together and set up in the data provider. The mocking framework here is Mockito, but I imagine this type of problem could crop up with other frameworks too.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class CarTest extends ShouldMatchers {

  var trackMock1: Track = _
  var positionMock1: Position = _
  var trackMock2: Track = _
  var positionMock2: Position = _

  @BeforeMethod
  def setupMocks() {
    trackMock1 = mock(classOf[Track], "track1")
    positionMock1 = mock(classOf[Position], "pos1")
    trackMock2 = mock(classOf[Track], "track2")
    positionMock2 = mock(classOf[Position], "pos2")
  }

  @DataProvider(name = "positionsProvider")
  def positionsProvider() = {
    def makeTestPosition(position: Position, track: Track) = {
      when(position.track).thenReturn(track)
      Array[Object](position, track)
    }
    Array(makeTestPosition(positionMock1, trackMock1),
          makeTestPosition(positionMock2, trackMock2))
  }

  @Test(dataProvider = "positionsProvider")
  def carPositionHasRightTrack(position: Position, track: Track) {
    val underTest = Car(position)
    car.track should be (track)
  }
}

This doesn't work, with a failure message saying that Track track2 does not match Track track2. The debugger showed me the objects being compared indeed had different identifiers, which led me to conclude that the test was being run as two independent method executions, while the data provider was executed only once. The mocks changed values on each execution of the test method, while the data provider kept the mock values from the first execution. The fix isn't entirely straightforward, and required a combination of changes:
  1. Create mocks whose methods you want to provide specific behavior within the data provider.
  2. Don't re-create the other mocks. Just reset them.
My tests involved many interacting objects, and several test cases to check those interactions. I didn't like duplicating the mock setup code from scratch for each test. However, reset on all mocks (including the Position mocks) would cause loss of behavior between test cases. So the creation of the Position mocks had to move into the data provider. This version of the test ran correctly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class CarTest extends ShouldMatchers {

  val trackMock1: Track = mock(classOf[Track], "track1")
  val trackMock2: Track = mock(classOf[Track], "track2")

  @BeforeMethod
  def setupMocks() {
    reset(trackMock1)
    reset(trackMock2)
  }

  @DataProvider(name = "positionsProvider")
  def positionsProvider() = {
    def makeTestPosition(track: Track) = {
      val positionMock = mock(classOf[Position])
      when(positionMock.track).thenReturn(track)
      Array[Object](positionMock, track)
    }
    Array(makeTestPosition(trackMock1),
          makeTestPosition(trackMock2))
  }

  @Test(dataProvider = "positionsProvider")
  def carPositionHasRightTrack(position: Position, track: Track) {
    val underTest = Car(position)
    car.track should be (track)
  }
}

Interestingly, Mockito developers seem to have been very concerned about the potential for reset to lead to poor coding practices, and had to be convinced that there were legitimate uses for it. While my code doesn't fall into the situations they had foreseen, calls to reset are confined to the test setup.

Wednesday, May 15, 2013

REST-Over-Finagle

I spent a little time searching for REST implementations that work with Finagle. Here's a short list of approaches and implementations. Sources include google and this post, and this post:

I'd love to get more recommendations.

Thursday, May 2, 2013

Scala: Specializing Maps

Scala has put a lot of work into implementing a nice type system, and a usable language on top of that type system. However, sometimes I find myself trying to do things that were easy in Java, and failing. For instance, creating a subclass of a map where the type arguments have been fixed, while retaining the nice factory methods etc. that Scala provides for the built-in Map. For the purpose of this post, let's say I want to implement a specialized IntToStrings map.

A few initial searches and experimentation led me to conclude that you don't want to do something like this:

class IntToStrings extends Map[Int, String] ...

This turns out to be quite a bit of work. And you don't get to use any of the Map factory methods. Instead, you're better off using the MapProxy as a starting point:

class IntToStrings extends MapProxy[Int, String] ...

In this scheme, we construct objects of type Map[Int, String], and wrap them in an IntToStrings. We have a choice to do this wrapping early or late. Late wrapping implies we wait until we need to perform an operation specific to IntToString. Early wrapping by contrast implies we do this operation well in advance, so that perhaps we can pass an appropriately typed object around our program. I tend to think early wrapping would be preferable, as we'd be taking better advantage of type information.

The transcript below executed on the scala REPL should demonstrate this idea more completely. We start with a raw map, and define the class IntToStrings. We then explore the conditions where operations defined on the proxy can be applied to the simple map.

scala> val m: Map[Int, String] = Map(1 -> "1", 2 -> "2")
m: Map[Int,String] = Map(1 -> 1, 2 -> 2)

scala> :paste
// Entering paste mode (ctrl-D to finish)

class IntToStrings(val self: Map[Int, String]) extends scala.collection.MapProxy[Int, String] {
  def getSum = self.foldLeft((0, "")) {
    (sum, entry) => (sum._1 + entry._1, sum._2 + entry._2)
  }
}

object IntToStrings {
  implicit def rawToIntToStrings(map: Map[Int, String]) = new IntToStrings(map)
  implicit def intStringMapToRaw(map: IntToStrings) = map.self
}

// Exiting paste mode, now interpreting.

defined class IntToStrings
defined module IntToStrings

Key to making IntToStrings usable is to ensure we have simple mechanisms for converting Map[Int,String] to it. The scala compiler needs to know that a conversion is applicable and desired. It isn't possible to have the compiler guess that we want to convert the map to an IntToStrings just by attempting the execution of an operation that the type provides. The compiler doesn't have sufficient information.

scala> m.getSum
<console>:9: error: value getSum is not a member of Map[Int,String]
              m.getSum

There are two ways of telling the compiler you want a conversion. The first is to make clear that we want an IntToStrings as the result of some computation, as below. In this case the compiler knows the type we want, looks if the companion object provides any implicit conversion methods, and applies it.

scala> val rm: IntToStrings = m
rm: IntToStrings = Map(1 -> 1, 2 -> 2)

scala> rm.getSum
res17: (Int, java.lang.String) = (3,12)

The second approach is to import the converters, bring them in scope, after which the compiler is able to do the implicit conversion to execute getSum.

scala> import IntToStrings._
import IntToStrings._

scala> m.getSum
res19: (Int, java.lang.String) = (3,12)

Rules for implicit conversion are well documented at many places. They're worth repeating here, as they are so critical to having proxy classes do what we would like: recognize that we have a map on which we wish to execute specialized operations. Which of these is preferable depends a great deal on how you wish to structure the program: do you want to pass around a data structure with the specific name you've given to the proxy class, or would you rather just operate on the map and execute operations as needed?

There is one other consideration here. It probably isn't a good idea to have the proxy class store additional state. It can be easily lost as you convert back and forth to the underlying map. In any case, adding state to the proxy is a bit of an abuse of the class, as we don't really have a pure proxy any more. You'll probably want to come up with a different way to deal with the object in that case.

Tuesday, March 19, 2013

Scala as a Functional Language

I've been using Scala for about three months now, and during that time have not delved that deeply into the arcana of the language. I do however feel I have a sufficient grasp to comment on the style and coherence of the language.

Scala tries to be many things. It is object oriented, with a more sophisticated inheritance mechanism than Java, that allows for a form of multiple inheritance. It is functional, in that it is relatively easy to access and manipulate function objects. It is strongly typed with type inferencing. This set of features makes it quite powerful, but at the same time lands it in some corners where the expression of the intent of the language becomes quite complex. Witness the awkward full signature of the map method on List:

def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That =
    {...}

There is an implicit type converter in the mix that ensures the result of the mapping function can be converted into the desired member type of the resultant list. Nice, but it takes a relatively sophisticated understanding of the language to properly realize the underlying behavior. On the other hand, given the right set of programmers, one can easily see how the language's power can be leveraged in quickly assembling and iterating over a program design. I'd call this an acceptable tradeoff.

There are however parts of the language that I find much more annoying.

(A Lack Of) Function Application in Scala

Tuples are great. Until you try to apply them to a function. I'll compare with a language whose functional capabilities are supposedly less sophisticated: Common Lisp. CL has two modes of function invocation:

(funcall #'max 1 2 3)
==> 3
(apply #'max (list 1 2 3))
==> 3

#' is syntactic sugar that translates the name to the corresponding function. There isn't a straightforward way to express this type of function invocation in Scala. The closest I've been able to get is something like this:

(List(1, 2, 3).foldLeft(Int.MinValue)) { _.max(_) }
==> 3

In all fairness, the scala version uses a reduction and does not rely on an n-ary max function. The reduction in CL would be:

(reduce #'max (list 1 2 3))
==> 3

In Scala you must create an anonymous function for the reduction. Because functions aren't really stand-alone entities, they are tied to objects or classes. Just as in Java. If max were defined on an object, however, we would be able to access it with the same succinctness as the CL version.

Now let's introduce another twist and see how far we can push Scala. In CL:

(reduce (lambda (n1 n2) (+ (* n1 n2))) (list 1 2 3) (list 4 5 6))
==> 32


(mapcar #'list (list 1 2 3) (list 4 5 6))
==> ((1 4) (2 5) (3 6))

The first one computes the "sum of products". Reduction happens over two lists simultaneously, and we're able to apply functions to pairs of arguments. This in Scala requires something like this:

(List(1, 2, 3).zip(List(4, 5, 6)).foldLeft(0)) {
    case (sum, (n1, n2)) => sum + (n1 * n2) }
==> 32

Not too bad. Note however that you have to introduce zipping and pattern matching into the mix of concepts required for this computation. The CL version has a much smaller cognitive footprint. Now let's consider the second one. It constructs a list of pairs, something like the zipping operation that scala natively provides. You can't really write zip as a mapping operation, as you can't map over more than one list. But zip gives you a list of tuples. If you wanted a list of lists, you'd do something like this:


scala> List(1, 2, 3).zip(List(4, 5, 6)).map { case (n1, n2) => List(n1, n2) }
res0: List[List[Int]] = List(List(1, 4), List(2, 5), List(3, 6))

Note that we can't directly use List.apply. There appears to be a deficiency in Scala's ability to apply a tuple of arguments to a function. You can't directly do that. Let's take a little detour to see what you can actually do with Scala. Let's define a simple function and try a few ways to use it:

scala> val f = { (_: Int) + (_: Int) }
f: (Int, Int) => Int = <function2> 


scala> f(1, 2)
res7: Int = 3 


scala> List(1, 2, 3).zip(List(4, 5, 6)).map(f)
<console>:9: error: type mismatch;
 found   : (Int, Int) => Int
 required: ((Int, Int)) => ?
              List(1, 2, 3).zip(List(4, 5, 6)).map(f)
                                                   ^ 


scala> List(1, 2, 3).zip(List(4, 5, 6)).map(f.tupled)
res12: List[Int] = List(5, 7, 9)

So, you can't directly apply a tuple to a function. You need to get a tupled version of the function. Now, let's see if we can use this in constructing our list of lists:

scala> List(1, 2, 3).zip(List(4, 5, 6)).map(List.apply.tupled)
<console>:8: error: missing arguments for method apply in object List;
follow this method with `_' if you want to treat it as a partially applied function
              List(1, 2, 3).zip(List(4, 5, 6)).map(List.apply.tupled)
                                                        ^

scala> List(1, 2, 3).zip(List(4, 5, 6)).map(List.apply _)
<console>:8: error: polymorphic expression cannot be instantiated to expected type;
 found   : [A]Seq[A] => List[A]
 required: ((Int, Int)) => ?
              List(1, 2, 3).zip(List(4, 5, 6)).map(List.apply _)
                                                        ^

Nope, can't do that. It appears n-ary functions in Scala can't be tupled. So you have to resort to a much more pattern matching based expression to get the same outcome. One can imagine the language being extended with tuples that support n-ary types. That would perhaps simplify some of the expressions above. But I can't imagine that would be straightforward.

My expectations of what a functional language should be able to do are perhaps a bit high. But I feel there are serious deficiencies in Scala's use as a true functional language. Objects are still the primary mode of organizing code in Scala. So Scala is at best object oriented with easier access and application of functions. This is still a huge step up from Java. And I plan on enjoying the language for what it offers.

Being in the JVM isn't a sufficient reason for Scala to suffer these deficiencies. I know of at least two languages that are able to be more effectively functional within the JVM: Clojure and ABCL, which is an implementation of Common Lisp. Maybe future versions will find more effective ways of bridging this gap. Or perhaps not, the language can be sufficiently effective even with the behavior it currently possesses.

Edit (3/20/2013): Poking around the web I found this post. It captures some of my immediate reactions, except one: in Scala, some functions are different from others. n-ary functions behave differently, for example. There is a lot you can do with functions in Scala, but the language on the whole feels more OO than FP.

Tuesday, February 26, 2013

Scala IDE Revisited

In a prior post I had described how to set up an emacs based Scala development environment. I gave up on IntelliJ's editor after a while and have returned to emacs. Finger memories die hard. I now use emacs and IntelliJ based on the development phase I'm in. emacs wins when there's lots of editing and less exploration. Code completion, debugging, and navigation to sources for external libraries seems much simpler in IntelliJ.

I've made a few changes in how I use emacs for Scala.

  1. You don't need to use gradlew from the gradle-ensime plugin. I have been generating .ensime files with gradle 1.3.
  2. I have picked up minimap-mode, which gives an overview sidebar for navigating the buffer. The problem is that the minimap is tied to the buffer, so opening a different buffer doesn't update the minimap.
  3. I have installed some more themes. How did I ever do without solarized-dark?
Ensime is still a source of frustration, but has a ton of goodies.
  1. Autocomplete doesn't work reliably.
  2. It disconnects from the background server randomly.
This arrangement is so far working well for me. I hope eclipse catches up with available gradle support for scala. The emacs+ scheme extension for eclipse is awesome. That extension made emacs usable. But now that I've started using emacs again full time for development, I'm not sure if I'd go back.

Monday, February 25, 2013

Effective com.twitter.util.Future

Writing scala software that effective manages asynchronous execution through Future's is quite straightforward, once you understand the ins and outs of the particular framework. Twitter's implementation of Future in Scala is part of their util library. I wanted to understand how to best use them, and after lots of reading, source code examination, and experimentation, I have results I feel are worth sharing. There are other Future implementations out there, including one in Scala 2.10. I hope what I have to share here will transfer to other frameworks with some tweaks.

The Setup

Running the Scala REPL via sbt made the exploration substantially easier. I used the following build.sbt:
name := "Future Test"

version := "0.0"

organization := "org.sfmishras"

scalaVersion := "2.9.2"

resolvers += "twitter" at "http://maven.twttr.com/"

libraryDependencies +=  "com.twitter" % "util-core" % "6.0.5"

initialCommands in console := """
        import com.twitter.util._
        import java.lang.Thread
        import java.util.concurrent.{Executors, ExecutorService, TimeUnit}
"""

Run sbt console, and you will have everything you need to follow along.

Running Asynchronously

Perhaps a bit confusingly, not all futures run asynchronously. Hopefully by the time you get to the end of this post you will realize why this is so. The following calls all execute the function provided to the future immediately, in the current thread. So where the function calls Thread.sleep, execution will pause.
scala> var f = Future[Int] { 10 }
f: com.twitter.util.Future[Int] = com.twitter.util.ConstFuture@435fd27a

scala> f.get()
res0: Int = 10

scala> f = Future[Int] { Thread.sleep(5000); 10 }
f: com.twitter.util.Future[Int] = com.twitter.util.ConstFuture@5ea96ab7

Now we demonstrate asynchronous execution. Unlike Future, FutureTask instantiates and returns immediately. It has to be explicitly executed in an executor, and calling get() on that task waits for the sleep to complete. (You have to execute the get within five seconds of executing the task to see this in action!)
scala> val pool: ExecutorService = Executors.newFixedThreadPool(2)
pool: java.util.concurrent.ExecutorService = java.util.concurrent.ThreadPoolExecutor@22add54d[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]

scala> var ft = new FutureTask[Int]( { Thread.sleep(5000); 10 } )
ft: com.twitter.util.FutureTask[Int] = Promise@1343838719(state=Waiting(null,List()))

scala> pool.execute(ft); ft.get()
res1: Int = 10

Parallel Futures

The method select on a future runs this and its argument future in parallel. The result of select is a new future that holds the value of the first future to complete. Let's see how this works when we combine Future and FutureTask in different ways.
scala> ft = new FutureTask[Int]({ Thread.sleep(5000); 10 })
ft: com.twitter.util.FutureTask[Int] = Promise@1875826460(state=Waiting(null,List()))

scala> f = Future[Int] { Thread.sleep(5000); 20 }
f: com.twitter.util.Future[Int] = com.twitter.util.ConstFuture@2294eb7c

scala> ft.isDefined
res2: Boolean = false

scala> f.isDefined
res3: Boolean = true

scala> var sf = ft.select(f)
sf: com.twitter.util.Future[Int] = Promise@1353874654(state=Done(Return(20)))

scala> sf.isDefined
res4: Boolean = true

scala> sf.get()
res5: Int = 20
f.select(ft) produces the same outcome as ft.select(f), so I'm leaving that out of this writeup. It isn't useful to combine synchronous and asynchronous execution in this manner. We really see the benefit with two asynchronous executions. The result of select isn't defined until one of the futures complete execution. And the first to complete gives sf its value. Also note that sf can't itself be executed. Its value can only be observed.
scala> ft = new FutureTask[Int]({ Thread.sleep(5000); 10 })ft: com.twitter.util.FutureTask[Int] = Promise@698352755(state=Waiting(null,List()))

scala> var ft2 = FutureTask[Int]({ Thread.sleep(3000); 20 })
ft2: com.twitter.util.FutureTask[Int] = Promise@1215135919(state=Waiting(null,List()))

scala> sf = ft2.select(ft)
sf: com.twitter.util.Future[Int] = Promise@621087996(state=Interruptible(List(),<function1>))

scala> sf.isDefined
res8: Boolean = false

scala> pool.execute(sf)
<console>:17: error: type mismatch;
 found   : com.twitter.util.Future[Int]
 required: java.lang.Runnable
              pool.execute(sf)
                           ^

scala> pool.execute(ft); pool.execute(ft2)

scala> sf.isDefined
res11: Boolean = true

scala> sf.get()
res12: Int = 20

Another way to set up the above experiment is through a FuturePool. This data structure takes an ordinary function, and turns it into a future that's running asynchronously.
scala> val fp = FuturePool(pool)
fp: com.twitter.util.ExecutorServiceFuturePool = com.twitter.util.ExecutorServiceFuturePool@3ea32dba

scala> sf = (fp { Thread.sleep(5000); 10 }).select(fp { Thread.sleep(3000); 20 })
sf: com.twitter.util.Future[Int] = Promise@1324160278(state=Interruptible(List(),<function1>))

// More than 3 seconds later...

scala> sf.isDefined
res13: Boolean = true

scala> sf.get()
res14: Int = 20

Serial Futures


We now consider two forms of serial execution: passing the result of a future into another computation, and taking over from the execution of a future in case it fails. At this point we have a reasonable understanding of the behavior of Future vs FutureTask, so we will not explore possible combinations of these.

The result of a future can be passed into a new future using flatMap. This is equivalent to simple function composition. We have the new future continue execution in the same thread as f1. If f1 throws, the function argument of flatMap isn't executed.
scala> var f1 = Future { 10 }
f1: com.twitter.util.Future[Int] = com.twitter.util.ConstFuture@2a21d0b1

scala> (f1 flatMap { x: Int => Future { x + 5 } }).get()
res15: Int = 15

Now, suppose we want our computation to be timebound. ft can thus fail through timeout or exception. The within method creates a timebound future that throws after the timer has expired. The example below thus also covers the ordinary exception situation.
scala> ft = new FutureTask[Int]({ Thread.sleep(5000); 10 })
ft: com.twitter.util.FutureTask[Int] = Promise@1003224409(state=Waiting(null,List()))

scala> f = ft.within(new JavaTimer(false), Duration(1, TimeUnit.SECONDS))
f: com.twitter.util.Future[Int] = Promise@759077407(state=Transforming(List(),Promise@1003224409(state=Waiting(<function1>,List()))))

scala> f.rescue({ case _: Throwable => Future { 20 } })
res16: com.twitter.util.Future[Int] = Promise@694580545(state=Done(Return(20)))

scala> pool.execute(ft)

scala> f.isDefined
res18: Boolean = true

scala> f.get()
com.twitter.util.TimeoutException: 1.seconds
    at com.twitter.util.Future$$anonfun$2.apply$mcV$sp(Future.scala:568)
    at com.twitter.util.JavaTimer$$anon$1.run(Timer.scala:160)
    at java.util.TimerThread.mainLoop(Timer.java:555)
    at java.util.TimerThread.run(Timer.java:505)


scala> res16.get()
res20: Int = 20

Putting It Together

We've explored setting up parallel and serial combinations of futures in Scala using Twitter's implementation. And we have used get() at various points to examine results. Note that in practice, you never really have to call get all that often. Doing so will put your application in a synchronous wait, precisely what you were trying to avoid. Most of the time, if you do this right, you should be able to accumulate results from futures into concurrent data structures directly. For instance, if you were trying to fill entries of a map, you could have futures write directly into a concurrent hash map, and pick up the result when they all completed. There is a lot more to this that I don't understand, for instance, where one might want to use a Promise directly.

Friday, January 18, 2013

Goodby Scala + Gradle + emacs!

Well, that was short lived... I understand that ensime works better with sbt, but I'm restricted in my tool choices. And I can't figure out how to get ScalaTest to run within ensime directly. I must for now set aside ensime and get some work done.

Thursday, January 17, 2013

Scala + Gradle + emacs

It's been a while since I've been able to use emacs as a primary development environment. Since the eclipse development environment isn't all that great for scala, I thought I'd see how well emacs served in that role. First we'll set up emacs and scala. These directions are based on this blog post.
  1. Install scala-mode2. The remainder of the post assumes you have used the preferred installation method, the emacs package manager.
  2. Download ensime and install it in ~/.emacs.d
  3. Add the following to ~/.emacs.d/init.el (or ~/.emacs if you prefer):
    (add-to-list 'load-path "~/.emacs.d/ensime/elisp")
    (require 'scala-mode2)
    (require 'ensime)
    (add-hook 'scala-mode-hook 'ensime-scala-mode-hook)
  4. Once you have a .ensime file for your scala project, you can run M-x ensime, and you're off!
Ideally, you'd be generating your .ensime from a build.gradle (or build.sbt or whichever tool you prefer) rather than writing it from scratch. From gradle, you would follow the directions for the gradle-ensime plugin. The directions in the  project aren't particularly good, and even though it has a configuration that doesn't match your general project specification, things still seem to work. Here's the step-by-step:
  1. Download gradle-ensime.
  2. In gradle-ensime, run ./gradlew install
  3. In your build.gradle, add the following towards the top of the file:
    buildscript {
      repositories { mavenLocal() }
    
      dependencies {
        classpath group: 'net.coacoas', name: 'gradle-ensime', version: '0.1.0-SNAPSHOT'
      }
    }
    apply plugin: 'ensime'
  4. In your project directory, now run <gradle-ensime-dir>/gradlew ensime
  5. You should now have a .ensime file that you can use when you M-x ensime, in emacs.

[Edit: ensime prefers scala-mode2 over scala-mode, so the instructions now reflect that.]

Wednesday, January 2, 2013

Types in Scala

Writing a toy program in Scala is turning out to be surprisingly difficult. I think I'm running into a something that makes Scala fundamentally different from many other languages I've programmed with: Scala types are surprisingly complex. I don't necessarily mean this in a negative way.

I've so far tried manipulating two types in my program: maps and streams. Maps can be mutable or immutable. The latter seems to be the default behavior. When I tried creating an immutable map from a mutable one, the compiler complained with an obscure message:

scala.collection.mutable.Map[String, String]().toMap()

type mismatch; found: scala.collection.immutable.Map[String,String]; required: <:<[(String, String),(String, String)]

The argument doesn't seem to be used in the implementation of toMap() though. Instead of chasing down this mystery, for my program, I have used Map.newBuilder, as has been used in toMap().

But now that I'm looking at this again, I notice that:
  1. I should have called scala.collection.mutable.Map[String, String]().toMap.
  2. The required argument is a generic transformer from the map elements (if the map is viewed as a collection of pairs) to map entries. This is a vacuous transformation. There's more information in this StackOverflow thread.
Streams seem fundamentally different data structures from Java streams. Looking through the API, I don't see any operations on scala streams analogous to the low level streams, such as file streams, that Java provides. I might have missed something. Otherwise, it would seem to me that Scala isn't really a "full" language, in that there are basic operations that it expects you to implement through Java. Perhaps this isn't that different from clojure either, which explicitly aims to be a hosted language.