The Holy Java

Building the right thing, building it right, fast

Clojure: How To Prevent “Expected Map, Got Vector” And Similar Errors

Posted by Jakub Holý on April 30, 2014

What my Clojure code is doing most of the time is transforming data. Yet I cannot see the shape of data being transformed – I have to know what the data looks like on the input and hold a mental model of how they change at each step. But I make mistakes. I make mistakes in my code so that the data does not correspond anymore to the model it should follow. And I make mistakes in my mental model of what the data currently looks like, leading likely to a code error later on. The end result is the same – a little helpful exception at some later step regarding wrong shape of data. There are two problems here: The error typically provides too little useful information and it usually manifests later than where the code/model mistake actually is. I therefore easily spend an hour or more troubleshooting these mistakes. In addition to that, it is also hard to read such code because a reader lacks the writer’s mental model of the data and has to derive it herself – which is quite hard especially if the shape of the input data is not clear in the first place.

I should mention that I of course write tests and experiment in the REPL but I still hit these problems so it is not enough for me. Tests cannot protect me from having a wrong model of the input data (since I write the [unit] tests based on the same assumptions as the code and only discover the error when I integrate all the bits) and even if they help to discover an error, it is still time-consuming the root cause.

Can I do better? I believe I can.

The hard to troubleshoot errors with delayed manifestation and hard to understand code that communicates only half of the story (the transformations but not the shape of the data being transformed) is the price we pay for the power of dynamic typing. But there are strategies to lower this price. I want to present three of them: small, focused functions with good names, destructuring as documentation, and judicious use of pre- and post- conditions.

The content of this post is based on what I learned from Ulises Cerviño Beresi during one of the free Clojure Office Hours he generously offers, similarly to Leif in the US.

So we need to make the shape of data more obvious and to fail fast, preferably with a helpful error message.

The main idea is:

  1. Break transformations into small, simple functions with clear names
  2. Use destructuring in function arguments to document what data is expected
  3. Use pre- and post-conditions (and/or asserts) both as checks and documentation
  4. (All the testing and interactive exploration in REPL that you already do.)

A simplified example

We have a webshop that sells discounted cars. We also have occasional campaigns with increased discounts for selected cars. For each car we have also a number of keywords people can use to find it and categories it belongs to. Below is code that processes raw car + campaigns + search keywords data from a DB query, first the original and then the refactored one with checks:

A real-world example

We have a webshop that sells discounted cars. Each car we sell has a base discount (either an absolute amount or percentage) and we also have occasional campaigns for selected cars. For each car we have also a number of keywords people can use to find it.

Original code

Below is code that processes raw car + campaigns + search keywords data from a DB query, selected the best applicable campaign and computing the final discount:

Defects and me

I had originally two [discovered] errors in the code and both took me quite a while to fix – first I forgot to convert JSON from string into a map (wrong assumption about input data) and then I run merge-campaigns directly on the list of car+campaign lists instead of mapping it (the sequential? precondition did not help to detect this error). So the transformations are clearly too error-prone.

The stack traces did not contain enough helpful context info (though a more experienced Clojurist would have certainly found and fixed the root causes much faster):

## Forgotten ->json:
java.lang.NullPointerException:
 clojure.lang.Numbers.ops Numbers.java:  961
  clojure.lang.Numbers.gt Numbers.java:  227
  clojure.lang.Numbers.gt Numbers.java: 3787
       core/discount-size     cars.clj:   13
    core/compute-discount     cars.clj:   36
-------------
## Forgotten (map ..):
java.lang.ClassCastException: clojure.lang.PersistentVector cannot be cast to clojure.lang.IPersistentMap
              RT.java:758 clojure.lang.RT.dissoc
            core.clj:1434 clojure.core/dissoc
            core.clj:1436 clojure.core/dissoc
          RestFn.java:142 clojure.lang.RestFn.applyTo
             core.clj:626 clojure.core/apply
cars.clj:36 merge-campaigns
...

 

Refactored

This is the code refactored into smaller functions with checks (and it certainly can be improved much more):

Downsides

The main problem with pre- and post-conditions is that they do not provide any useful context in their error message and do not support adding a custom message. An error like

Assert failed: (let [a (key m)] (or (nil? a) (instance? java.sql.Array a))) cars.clj:18 user/jdbc-array-to-set

is better than not failing fast but does not tell as what the invalid value was and which of the thousands of cars had the invalid value.

Also, the checks are performed at runtime so they have a performance cost. This might not be a problem with checks such as (map?) but could be with f.ex. (every?).

What about duplication?

Do you repeat the same checks again and again? Then you could either copy them using with-meta (they end-up in metadata anyway) or reuse the explicitely:

(defn with-valid-car [f] (fn [car] {:pre [:make :model :year]} (f car)))

(def count-price (with-valid-car (fn [car] (do-something car))))
;; or make & use a macro to make it nicer

What about static types

This looks arguably like a good case for static types. And yes, I come from Java and lack them. On the other hand, even though static typing would solve the main category of problems, it creates new ones and has its liits.

A) I have actually quite a number of “types” here so it would require lot of classes to model fully:

  1. Raw data from the DB – car with campaign fields and keywords, category_ref as java.sql.Array
  2. Car with keywords as a sequence
  3. Car with category_ref as a sequence
  4. Car with a nested :campaign “object”
  5. Car with a nested :best-campaign object and with :rate (you could have :rate there from start, set initially to nil, but then you’d still need to ensure that the final function sets it to a value)

B) A key strength of Clojure is the use of generic data structures – maps, vectors, lazy sequences – and powerful, easy to combine generic functions operating on them. It makes integrating libraries very easy since everything is just a map (and not a custom type that needs to be converted) and you can always transform these with your old good friends functions – whether it is a Korma SQL query definition, result set, or a HTTP request. Static types take this away.

C) Types permit only a subset of checks that you might need (that is unless you use Haskell :)) – they can check that a thing is a car but not that a return value is in the range 7 … 42.

D) Some functions do not care about the type, only its small part – f.ex. jdbc-array-to-set only cares about the argument being a map, having the key, and if set, the value being a java.sql.Array.

What else is out there?

Conclusion

Using smaller functions and pre+post conditions, I can discover errors much earlier and also document the expected shape of the data better, even more so with destructuring in fn signatures. There is some duplication in the pre/post conditions and the error messages are little helpful but is much better. I guess that more complex cases may warant the use of core.contracts or even core.typed / schema.

What strategies do you use? What would you improve? Other comments?

I encourage you to fork and improve the gist and share your take on it.

Updates

  1. Lawrence Krubner recommends using dire to capture the arguments and return value to provide a useful error message
  2. Alf Kristian recommends adding more tests and integration tests and if it is not enough, using core.typed rather than :pre and :post (example)
About these ads

8 Responses to “Clojure: How To Prevent “Expected Map, Got Vector” And Similar Errors”

  1. This is maybe as good a solution as you can hope for, but frankly when I was reading it, I thought, “This is a poster-child use case for a good static type system.” Unless I’m reading it wrong, you’re basically manually implementing static typing here. The errors you’re checking for here are never going to be due to malformed user input, always programmer mistakes that could be more usefully caught at compile time. I am not saying static typing is superior in general, but it sure seems like a better tool for this particular job.

    • Thank you Grimm. Yes, I indeed miss the safety net of static types. And it is not just “this particular job” – I have the problem with all Clojure code I write. On the other hand, with static typing I loose some of the power that dynamic typing provides (though I am not using much of that here). BTW, core.typed enables adding progresive typing to Clojure, so perhaps that is a good solution.

      The question is how much I need to check? The more the more likely I should use a more powerful solution such as core.typed or schema. But perhaps I can find a lighter-wegiht compromise that provides me good enough safety – in same cases even more than static typing as I can do any checks (well, unless we speak about Haskell :)) – but does not require the complexity of a full-blown type checker.

      • Also, the fact that everything is a map and not a special type makes is super easy to combine libraries and generic functions thus making me really productive.

    • matvore said

      It looks like the code was written by someone who wish they had a static type system, but static typing isn’t the only way to solve the data format problem. Unit tests and Clojure protocols are also a good way to handle this. Protocols let you treat one type as some other type after you realize you are using a new type in an existing function, so it’s ad-hoc and doesn’t require any up-front effort.
      For instance, you could make a ParseAsJSON protocol with a single “parse” function. For String implementation of the protocol, actually parse the JSON. For the Map implementation, just return the Map unchanged. This makes the original function more flexible.

  2. I have gotten in the habit of doing 2 things:
    1.) I used :pre and :post conditions as you are doing here
    2.) I also use dire so when the :pre or :post conditions fail and an Assert exception is thrown, I can capture the arguments and the return value and write a meaningful error message:
    https://github.com/MichaelDrogalis/dire
    I do a lot of this:
    :post [(:discount %)]
    I also test for value ranges:
    :post
    [
    (> (:totals %) 100)
    (< (:totals %) 1000)
    ]
    I am thinking I might use prismatic/schema in the future.

  3. Alf Kristian said

    I do try to embrace Clojure’s dynamic typing, but I have to say, sometimes I miss what is possible to do in e.g. Scala. So, I try to be even more conscious about writing unit tests when writing Clojure. I never regret having tests, but often regret being too lazy, and not having them. I usually use Midje (with mocking of external stuff) to make “integration” tests. Testing the public api of a namespace like that, is really handy, and it often catches type errors you describe here. When a relational database is involved, I usually do integration tests with H2.
    I haven’t used pre- or post-conditions at all, and I think I would rather go for core.typed if I wanted better (type) checking. Why not try core.typed? No runtime penalty and pretty powerful. Found a good example in the new Clojure Cookbook:
    https://github.com/clojure-cookbook/clojure-cookbook/blob/d0b080d6a702ffcf630a9091ba6f75bb0989f0e2/10_testing/10-09_hof-typed.asciidoc
    “While Clojure’s built-in pre/post conditions are useful for defining anonymous functions that fail fast, these checks only provide feedback at runtime. Why not type-check our higher-order functions as well? core.typed’s type-checking abilities aren’t limited to only data types—it can also type-check functions as types themselves.”

    • Thank you! I guess I could test more and add few integration tests into the mixture …
      I should rally try out core.typed, thanks for the pointer.

Sorry, the comment form is closed at this time.

 
%d bloggers like this: