The Holy Java

Building the right thing, building it right, fast

Archive for April, 2011

Most interesting links of April (renewed)

Posted by Jakub Holý on April 30, 2011

Only two articles this month:

Computerworld: 22 free tools for data visualization and analysis

– great review if different categories of data analysis and visualization tools. The tools I haven’t known (i.e. excluding R, Google Charts etc.) and found them especially interesting:
Data web apps: Google Refine (data cleansing in a spreadsheet-like UI: clustering, data distribution overview, …), Google Fusion Tables (data => map etc., beta), Impure (rich & interactive data visualization via a drag-and-drop UI reminiscent of Yahoo Pipes; cons: lacking documentation, steep learning curve, check the teaser video).
JS libraries: Exhibit (JavaScript library by MIT for creating interactive visualizations e.g. for articles – incl. maps, timeplots, calendars etc., supporting filtering, sorting, searching), InfoVis Toolit (JS lib for interactive data visualizations; pros: beautiful, cons: choice of visualization types is somewhat limited), Protovis (by Stanford University’s Visualization Group; one of the more popular JS libraries for turning data into visuals, great docs, robust); OpenLayers ( example; customize & display a map, e.g. Open Street Map of Google), Polymaps (interactive maps with overlays)
GIS: OpenHeatMap (webapp, “astonishingly easy to create a color-coded map from many types of location data”)
Other: Timelines with TimeFlow (interesting desktop /java/ app x alpha) or SIMILE Timeline widget (JS); Word clouds: IBM Word-Cloud Generator (free, desktop /java/); Gephi (graph/network visualization & exploration; desktop)

The Evolution of Test Driven Developers

– an entertaining and enlightening article with valuable links to resources that can help you get to the next evolutionary step, one of its benefits is that it helps to understand the true value of the different types of tests (some -> TDD -> Behaviour Driven Development (‘what’ rather than ‘how’) -> Acceptance Test Driven Development) and the shift from a technical to a business perspective along the line


Posted in General, Testing, Tools, Top links of month | Tagged: , , | Comments Off on Most interesting links of April (renewed)

How stateless can you go?

Posted by Jakub Holý on April 29, 2011

I’ve attended an Oslo Coding Dojo named “How stateless can you go?” lead by Thomas K. Nilsson. The goal was to write a toString() method for a tree structure printing nodes with proper indentation w.r.t. their depth and then to make it as stateless as possible without any other regard (such as performance or cleanliness of the code).

It was very interesting to compare the original, stateful version and the resulting stateless one and to see the solution in various languages (Haskell, Clojure, Groovy, C#, Java, Scala) – it looked actually pretty similar in all.

What I’ve learned is that stateless (i.e. functional-style) code looks much cleaner for you get rid of lot of noise such as local variables and loops. In practice it is important to use a language with an efficient implementation of recursion (especially tail-recursion) and with data structures that lead themselves easily to recursive processing, i.e. make it easy and efficient to process the first element of a collection and do that recursively for the rest without modifying the collection (and providing utility methods like each). It is of course best to have languages that support map/reduce.

You can check the slides and various solutions at GitHub and see our primitive and stateless implementations below. (We did it in a nearly TDD-manner, but I won’t include the test here as it isn’t essential.)

Update: There are more solutions linked to from the meetup’s comments – search for “github” – and there is also a link to an article series for deeper discussion of challenges in writing pure and stateless code.

Read the rest of this entry »

Posted in Languages | Tagged: , , | 2 Comments »

What I’ve Learned from (Nearly) Failing to Refactor Hudson

Posted by Jakub Holý on April 28, 2011

We’ve tried to refactor but without success; only later have I been able to refactor it successfully, thanks to the experience from the first attempt and more time. In any case it was a great learning opportunity.

Lessons Learned

The two most important things we’ve learned are:

  • Never underestimate legacy code. It’s for more complex and intertwined than you expect and it has more nasty surprises up in its sleeves than you can imagine.
  • Never underestimate legacy code.

And another important one: when you’re tired and depressed, have some fun reading the “best comments ever” at StackOverflow :-). Seeing somebody else’ suffering makes one’s own seem to be smaller.

I’ve also started to think that the refactoring process must be more rigorous to protect you from wandering too far your original goal and from getting lost in the eternal cycle of fixing something <-> discovering new problems. People tend to do depth-first refactoring changes that can easily lead them astray, far from where they actually need to go; it is important to stop periodically and look at where we are, where we are trying to get and whether we aren’t getting lost and shouldn’t just prune the current “branch” of refactorings and return to some earlier point and try perhaps a completely different solution. I guess that one of the key benefits of the Mikado method is that it provides you with this global overview – which gets easily lost when it is only in your head – and with points to roll-back to.

Evils of Legacy Code

Use a dependency injection framework, for God’s sake! Singletons and their manual retrieval really complicate testing and affect the flexibility of the code.

Don’t use public fields. They make it really hard to replace a class with an interface.

Reflection and multithreading make it pretty difficult if not impossible to find out the dependencies of a particular piece of code and thus the impacts of its change. I’d hard time finding out all the places where Hudson.getInstance is invoked while its constructor is still running.

Our Way to Failure and Success

There is a lot of refactoring that could be done with, for it is a typical God Class which additionally spreads its tentacles through the whole code base via its evil singleton instance being used by just about anyone for many different purposes. Gojko describes some of the problems worth removing.

The Failure

We’ve tried to start small and “normalize” the singleton initialization, which isn’t done in a factory method, but in the constructor itself. I haven’t chosen the goal very well as it doesn’t bring much value. The idea was to make it possible to have potentially also other implementations of Hudson – e.g. a MockHudson – but with respect to the state of the code it wasn’t really feasible and even if it was, a simple Hudson.setInstance would perhaps suffice. Anyway we’ve tried to create a factory method and move the initialization of the singleton instance there but at the end we got lost in concurrency issues: there were either multiple instances of Hudson or the application deadlocked itself. We tried to move pieces of code around, but the dependencies wouldn’t have let us do that.

The Success

While reflecting on our failure I’ve come to the realization that the problem was that Hudson.getInstance() is called (many times) already during the execution of the Hudson’s constructor by the objects used there and threads started from there. It is of course a hideous practice to access a half-baked instance before it is fully initialized. The solution is then simple: to be able to initialize the singleton field outside of the constructor, we must remove all calls to getInstance from its context.

Mikado Graph: Hudson Refactoring (click for full size)

The steps can be seen very well from the corresponding GitHub commits. Summary:

  1. I used the “introduce factory” refactoring on the constructor
  2. I modified ProxyConfiguration not to use getInstance but to expect that the root directory will be set before its first use
  3. I moved the code that didn’t need to be run from the constructor out, to the new factory method – this resulted in some, hopefully insignificant, reordering of the code
  4. Finally, I also moved the instance initialization to the factory method

I can’t be 100% sure that the resulting code has the same semantic as far as it matters, for I had to do few changes outside of the safe automated refactorings and there are no useful tests except for trying to run the application (and, as is common with legacy applications, it wasn’t feasible to create them beforehand).

The refactored code doesn’t provide much added value yet but it is a good start for further refactorings (which I won’t have the time to try 😦 ), it got rid of the offending use of an instance while it is being created and the constructor code is simpler and better. The exercise took me about four pomodoros, i.e. little less than two hours.

If I had the time, I’d continue with extracting an interface from Hudson, moving its unrelated responsibilities to classes of their own (perhaps keeping the methods in Hudson for backwards compatibility and delegating to those objects) and I might even  use some AOP magic to get a cleaner code while preserving binary compatibility  (as Hudson/Jenkins actually already does).

Try it for Yourself!


Get the code

Get the code as .zip or via git: # 50MB => takes a while
cd coding-dojo
git checkout -b mybranch INITIAL

Compile the Code

as described in the dojo’s README.

Run Jenkins/Hudson

cd coding-dojo/2011-04-26-refactoring_hudson/
cd maven-plugin; mvn install; cd ..       # a necessary dependency
cd hudson/war; mvn hudson-dev:run

and browse to http://localhost:8080/ (Jetty should pick changes to class files automatically).

Further Refactorings

If you’re the adventurous type, you can try to improve the code more by splitting out the individual responsibilities of the god class. I’d proceed like this:

  1. Extract an interface from Hudson and use it wherever possible
  2. Move related methods and fields into (nested) classes of their own, the original Hudson’s methods just delegate to them (the move method refactoring should be useful); for example:
    • Management of extensions and descriptors
    • Authentication & authorization
    • Cluster management
    • Application-level functionality (control methods such as restart, updates of configurations, management of socket listeners)
    • UI controller (factoring this out would require re-configuration of Stapler)
  3. Convert the nested classes into top-level ones
  4. Provide a way to get instances of the classes without Hudson, e.g. as singletons
  5. Use the individual classes instead of Hudson wherever possible so that other classes depend only on the functionality they actually need instead of on the whole of Hudson

Learning about Jenkins/Hudson

If you want to understand mode about what Hudson does and how it works, you may check:

Sidenote: Hudson vs. Jenkins

Once upon time there was a continuous integration server called Hudson but after its patron Sun died, it ended up in the hands of a man called Oracle. He wasn’t very good at communication and nobody really knew what he is up to so when he started to behave little weird – or at least so the friends of Hudson perceived it – those worried about Hudson’s future (including most people originally working in the project) made its clone and named it Jenkins, which is another popular name for butlers. So now we have Hudson backed by Oracle and the maven guys from Sonatype and Jenkins, supported by a vivid community. This exercise is based on the source code of the Jenkins, but to keep the confusion level low I refer to it often as Hudson for that is how the package and main class are called.


Refactoring legacy code always turns out to be more complicated and time-consuming than you expect. It’s important to follow some method – e.g. the Mikado method – that helps you to keep a global overview of where you want to go and where you are and to regularly consider what and why you’re doing so that you don’t get lost in a series of fix a problem – new problems discovered steps. It’s important to realize when to give up and try a different approach. It’s also very hard or impossible to write tests for the changes so you must be very careful (using safe, automated refactorings as much as possible and proceeding in small steps) but fear shouldn’t stop you from trying to save the code from decay.

Posted in General, Languages | Tagged: , , | 2 Comments »

What Do I Mean by a Legacy Code?

Posted by Jakub Holý on April 18, 2011

I’m using the term “legacy code” quite a lot, what do I mean by it? I like most the R. C. Martin’s description in his foreword to the Michael Feathers’ book Working Effectively with Legacy Code:

It conjures images of slogging through a murky swamp of tangled undergrowth with leaches beneath and stinging flies above. It conjures odors of murk, slime, stagnancy, and offal.

Read the rest of this entry »

Posted in Languages | Tagged: , , , | 3 Comments »

Refactoring the “Legacy” with the Mikado Method as a Coding Dojo

Posted by Jakub Holý on April 16, 2011

I’m preparing a coding dojo for my colleges at Iterate where we will try to collectively refactor the “legacy” Hudson/Jenkins, especially, to something more testable, using the Mikado Method. I’ve got the idea after reading Gojko Adzic’s blog on how terrible the code is and after discovering the Mikado Method by a chance. Since a long time I’m interested in code quality and since recently especially in improving the quality of legacy applications, where “legacy” means a terrible code base and likely insufficient tests. As consultants we often have to deal with such application and with improving their state into something easier and cheaper to maintain and evolve. Therefore such a collective practice is a good thing.

The Mikado Method

The Mikado Method, which the authors describe as “a tool for large-scale refactorings”, serves two purposes: Read the rest of this entry »

Posted in General, Languages | Tagged: , , , , | 1 Comment »

Real-world data prove that Agile, BDD & co. work – lecture by G. Adzic

Posted by Jakub Holý on April 14, 2011

I’ve attended a very inspirational lecture by Gojko Adzic, organized by the Oslo XP Meetup. Many people including some respectable persons claim that Lean, Agile, and high-level testing based on specifications (whether you call it Agile acceptance testing, Acceptance-test driven development, Example-driven development, Story-testing, Behavior-driven development, or otherwise – let’s call them all Specification by example) do not work.

To prove the contrary, Gojko has collected over 50 case studies of projects that were very successful thanks to using these methods. In his soon-to-be-published book, Specification by Example (download ch1, a review), he investigates what these projects and teams had in common, which was missing in the failed ones. So it’s great for two reasons: It documents how great success you can achieve with Specification by Example and it shows you how to implement it successfully.

Read the rest of this entry »

Posted in General, Testing | Tagged: , , , , | 2 Comments »

How to customize CKEditor with your own plugins, skins, configurations

Posted by Jakub Holý on April 4, 2011

This post summarizes what I’ve learned about customizing the open-source WYSIWYG rich-text editor CKEditor 3.5.2 with one’s own plugins, skins, and configurations. There is already a lot of good resources so wherever possible I will link to them and just summarize and/or supplement them. However I’ve found no overall guide for customizing CKEditor and thus intend to fill this vacancy.
Read the rest of this entry »

Posted in General | Tagged: | 4 Comments »

CKEditor: Hide some toolbar buttons on a per page basis

Posted by Jakub Holý on April 4, 2011

In my project we had CKEditor with a common toolbar used on many pages and we needed to be able to hide some of the buttons on some pages (e.g. email editor didn’t support some functionality/content). It took me a long time to figure a way to do it for CKEditor has no methods for simply removing/hiding buttons from a toolbar. My solution uses the fact that the configuration file can see variables defined in the including page and that it can contain functions – namely there is a function which takes the default toolbar definition and removes from it all the buttons mentioned in a variable, which is expected to be defined in the page.

Read the rest of this entry »

Posted in General | Tagged: | 1 Comment »

Code quality matters to the customers. A lot.

Posted by Jakub Holý on April 2, 2011

Some people argue that the main taks of a developer is to deliever working, value-bringing software to the customer and idealistic concepts such as code quality should not hinder that primary task. They acknowledge that it is good to strive for good code quality but say that sometimes code quality must give way to the quick deliverance of outcomes to the customer. After having worked on the code base so rotten that it drove less resistant programmers mad I have to strongly disagree. Code quality is not an abstract concept that has a value only in the developers’ world, it is a very real thing, which translates directly to money, namely if you are missing it, it translates into great financial losses over the time.

Read the rest of this entry »

Posted in General | Tagged: , , | Comments Off on Code quality matters to the customers. A lot.

CKEditor: Collapsing only 2nd+ toolbar rows – howto

Posted by Jakub Holý on April 1, 2011

Normally CKEditor (v3.5.2) hides/shows all the toolbar buttons when you press the collapse/expand button but I needed to always show the first row with “basic tools” and only collapse the second and following rows with advanced functionality tool buttons. CKEditor doesn’t have proper support for that but there is a simple workaround.

Update: Example solution (CKEditor 3.6.1) published, see the changes done or download the full source and open _samples/replacebyclass.html.

Read the rest of this entry »

Posted in General | Tagged: , , | 5 Comments »