The Holy Java

Building the right thing, building it right, fast

Archive for January, 2013

Most interesting links of January ’13

Posted by Jakub Holý on January 31, 2013

Recommended Readings

Various

  • Dustin Marx: Significant Software Development Developments of 2012 – Groovy 2.0 with static typing, rise of Git[Hub], NoSQL, mobile development (iOS etc.), Scala and Typesafe stack 2.0, big data, HTML5, security (Java issues etc.), cloud, DevOps.
  • 20 Kick-ass programming quotes – including Bill Gates’ “Measuring programming progress by lines of code is like measuring aircraft building progress by weight.”,  B.W. Kernighan’s “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”, Martin Golding’s “Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.” (my favorite)
  • How to Have a Year that Matters (via @gbrindusa) – do you want to just survive and collect possessions or do you want to make a difference? Some questions everybody should pose to him/herself.
  • Expression Language Injection – security defect in applications using JSP EL that can sometimes leads to double evaluation of the expressions and thus makes it possible to execute data supplied by the user in request parameters etc. as expressions, affects e.g. unpatched Spring 2.x and 3.

Languages etc.

  • HN discussion about Scala 2.10 – compilation speed and whether it matters, comparison of the speed and type system with Haskell and OCaml, problems with incremental compilation (dependency cycles, fragile base class), some speed up tips such as factoring out subprojects, the pros and cons of implicits etc.
  • Blog Mechanical Sympathy – interesting posts and performance tests regarding “writing software which works in harmony with the underlying hardware to gain great performance” such as Memory Access Patterns Are Important and Compact Off-Heap Structures/Tuples In Java.
  • Neal Ford: Functional thinking: Why functional programming is on the rise – Why you should care about functional programming, even if you don’t plan to change languages any time soon – N. Ford explains the advantages of FP and why FP concepts are spreading into other languages (higher abstractions enabling focus on the results over steps and ceding control to the language, more reusability on a finer level (higher-order functions etc.), few generic data structures with many operations -> better composability, “new” and different tool such as lazy collections, shaping the language towards the problem instead of vice versa, aligning with trends such as immutability)
  • Neal Ford: Java.next: The Java.next languages Leveraging Groovy, Scala, and Clojure in an increasingly polyglot world – a comparison of these languages with focus on what they are [not] suitable for, exploration of their paradigms (static vs. dynamic typing, imperative vs. functional)

SW development

  • How to Completely Fail at BDD – a story of an enthusiastic developer who tried to make everyone’s life better by introducing automated BDD tests and failed due to differences in culture (and inability to change thinking from the traditional testing), a surprising lack of interest in the tool and learning how to write good tests: “Culturally, my current team just isn’t ready or interested in something like this.” Morale: It is hard to change people, good ideas are not enough.
  • M. Feathers: Refactoring is Sloppy – refactoring is often prioritized out of regular development and refactoring sprints/stories aren’t popular due to past failures etc. An counter-intuitive way to get refactoring in is to imagine, during planning, what the code would need to be like to make it easy to implement a story. Then create a task for making it so before the story itself and assign it to somebody else then the story (to force a degree of scrutiny and communication). “Like anything else in process, this is medicine.  It’s not meant to be ‘the way that people do things for all time’ [..]” – i.e. intended for use when you can’t fit refactoring in otherwise. It may also make the cost of the current bad code more visible. Read also the commits (f.ex. the mikado method case).
  • Cyber-dojo: A great way to practice TDD together. Compare your read-green cycle and development over time with other teams. Purposefully minimalistic editor, a number of prepared tdd tasks.
  • On the Dark Side of “Craftsmanship” – an interesting and provoking article. Some developers, the software labouers, want to get work done and go home, they haven’t the motivation and energy to continualy spend time improving themselves. There is nothing wrong with that and we shouldn’t disparge them because of that. We shouldn’t divide people into craftsmen and the bad ones. A summary of and response to the varied reactions follows up in More on “Craftsmanship”. The author is right that we can’t expect everybody to spend nights improving her/his programming skills. Still they should not produce code of poor quality (with few exceptions) since maintaining such code costs a lot. There should be time for enough quality in a 9-5 day and people should be provided with enough guidance and education to be able to write decent code. (Though I’m not sure how feasible it is, how much effort it takes to become an acceptable developer.) Does the increased cost of writing (an learning to write) good code overweight the cost of working with bad code? That is an eternal discussion.

Cloud, web, big data etc.

  • Whom the Gods Would Destroy, They First Give Real-time Analytics (via Leon) – a very reasonable argument against real-time analytics: yes, we want real-time operational metrics but “analytics” only makes sense on a sensible amount of data (for the sake of statistical significance etc.) RT analytics could easily provide misguided results.
    CAP Twelve Years Later: How the “Rules” Have Changed (tl;dr, via @_dagi) – an in-depth discussion of the CAP theorem and the simplification (2 out of 3) that it makes; there are many more nuances. By Eric Brewer, a professor of computer science at the University of California, Berkeley, and vice president of infrastructure at Google.
  • ROCA: Resource-oriented Client Architecture – “A collection of simple recommendations for decent Web application frontends.” Server-side: true REST, no session state, working back/refresh etc. Client: semantic HTML independent of layout, progressive enhancement (usable with older browsers), usable without JS (all logic on the server) etc. Certainly not suitable for all types of apps but worthwile to consider the principles and compare them with your needs.

Clojure Corner

Tools

  • Vaurien, the Chaos TCP Proxy (via @bsvingen) – an extensible proxy that you can control from your tests to simulate network failure or problems such as delays on 20% of the requests; great for testing how an application behaves when facing failures or difficulties with its dependencies. It supports the protocols tcp, http, redis, memcache.
  • Wvanbergen’s request-log-analyzer for Apache, MySQL, PostgreSQL, Rails and more (via Zarko) – generates a performance report from a supported access log to point out requests that might need optimizing
  • Working Effectively With iTerm2 (Mac) – good tips in the body and comments

Favorite Quotes

A very good (though not very scientific) definition of project success applicable for distinguishing truly agile from process-driven projects:

[..] a project is successful if:

  • Something was delivered and put to use
  • The project members, sponsors and users are basically happy with the outcome of the project

– Johannes Brodwall in “How do we become Agile?” and why it doesn’t matter, inspired by Alistair Cockburn

(Notice there isn’t a single word about being “on time and budget”.)

Posted in General, Languages, Testing, Tools, Top links of month | Tagged: , , , , , , , , , , , | Comments Off on Most interesting links of January ’13

The Sprinting Centipede Strategy: How to Improve Software Without Breaking It

Posted by Jakub Holý on January 14, 2013

Re-published from blog.iterate.no.

Our code has been broken for weeks. Compiler errors, failing tests, incorrect behavior plagued our team. Why? Because we have been struck by a Blind Frog Leap. By doing multiple concurrent changes to a key component in the hope of improving it, we have leaped far away from its ugly but stable and working state into the marshes of brokenness. Our best intentions have brought havoc upon us, something expected to be a few man-days work has paralized us for over a month until the changes were finally reverted (for the time being).

Lessons learned: Avoid Frog Leaps. Follow instead Kent Beck’s strategy of Sprinting Centipede – proceed in small, safe steps, that don’t break the code. Deploy it to production often, preferably daily, to force yourself to really small and really safe changes. Do not change multiple unrelated things at the same time. Don’t assume that you know how the code works. Don’t assume that your intended change is a simple one. Test thoroughly (and don’t trust your test suite overly). Let the computer give you feedback and hard facts about your changes – by running tests, by executing the code, by running the code in production.

Read the rest of this entry »

Posted in General, SW development | Tagged: , , , | Comments Off on The Sprinting Centipede Strategy: How to Improve Software Without Breaking It

Bash: Parse Options And Non-Options With Getopts

Posted by Jakub Holý on January 9, 2013

Parsing script or function options and non-option arguments is easy in Bash with getopts but there are some catches, such as the need to reset OPTIND. We will se how to do it using getopts, shift, and case.

The code below will parse the function arguments and remove them so that $1 will refer to the first non-option argument (i.e. not starting with -). You would invoke it f.ex. as  latest_recur -x Hello -a '*.txt'.

# Find the latest files under the current dir, recursively; options:
# -a list all, not only 30 latest
#  - pattern passed to find's -name; ex.: "*.log.processed"
function latest_recur {
   local show_all=
   OPTIND=1
   while getopts "ax:" opt; do
      case $opt in
         a) show_all=yes ;;
         x) echo "You said: $OPTARG" ;;
         \?) echo "Invalid option: -$OPTARG" >&2; return 1;;
     esac
   done
   shift $((OPTIND-1))

   if [ -z "$1" ]; then NAME_ARG=""; else NAME_ARG="-name $1"; fi
   find -type f $NAME_ARG | xargs --no-run-if-empty stat --format '%Y :%y %n' | sort -nr | if [ -z "$show_all" ]; then head -n 30 -; else cat -; fi
}
  • #5, #9 the variable used to store the flag must be defined/reset first
  • #6 OPTIND is a global variable pointing to the next argument that getopts should parse; you must reset it manually (otherwise the next call to the function will ignore its arguments)
  • #7 getopts parses one by one all the supported options (a, x here) and stores them into $opt;
  • #10, #11 the value passed to the option (Hello, *.txt) is stored into the variable OPTARG
  • #14 we must manually shift away the processed option arguments so that the first non-option argument (‘*.txt’) will become argument number 1 as you can see at #16; OPTIND is set by getopts

Getopts can do quite a lot. It supports short options with or without arguments such as “-lht”, “-l -h -t”, “-l -I ‘*.swp'”. It can also report/ignore unknown arguments etc., see its brief documentation and this small getopts tutorial. Briefly said, getopts takes opstring and varname; opstring is a list of letters optionally followed by ‘:’ to indicate that that flag requires a value; varname is the name of the variable to store the flag name into. If you put : in front of the opstring (“:ax:”) then it will not complain about unknown options or missing arguments for options that require them.

Posted in General | Tagged: | 3 Comments »

Bash Magic: List Hive Table Sizes in GB

Posted by Jakub Holý on January 8, 2013

To list the sizes of Hive tables in Hadoop in GBs:

sudo -u hdfs hadoop fs -du /user/hive/warehouse/ | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'

Result:

448 [GB] hdfs://aewb-analytics-staging-name.example.com:8020/user/hive/warehouse/mybigtable
8 [GB]	hdfs://aewb-analytics-staging-name.example.com:8020/user/hive/warehouse/anotherone
0 [GB]	hdfs://aewb-analytics-staging-name.example.com:8020/user/hive/warehouse/tinyone

Posted in Tools | Tagged: , | Comments Off on Bash Magic: List Hive Table Sizes in GB

Fast Code To Production Cycle Matters: For Pleasure, Productivity, Profit

Posted by Jakub Holý on January 5, 2013

I spent one afternoon adding a much needed feature to our application. Now I have been waiting for several days for various people to review and approve it. And I have just realized how tiring it is and how much energy it takes from me.

To create something and get it out into production at once (plus minus) is fun and really motivates me to do and try stuff, it is a great feeling to see it immediately affecting the users and processes. And a quick feedback (from the users and the behavior of the application) – while still being engaged for the thing and having all the context in my mind – makes it easy and fun to fix and improve it, leading to a better result faster. Waiting, on the other hand, is exhausting and depressing. I usually start working on something else, forget the context, lose the interest (it is not easy to be fired up for two things at the same time) etc.

Therefore, if you care for happy developers and better products, make sure that your time from code to production is as short as possible. Look at your process, eliminate all delays, make it smooth.

PS: I am not saying that code reviews are bad. They are great. They just shouldn’t be a source of delay. For example if you can pair-program then you get an instant code review.

You might enjoy also other posts on effective development.

Posted in General | Tagged: , , | Comments Off on Fast Code To Production Cycle Matters: For Pleasure, Productivity, Profit

My 2012 in Review

Posted by Jakub Holý on January 4, 2013

With year 2012 over it is perhaps time to look back and see what interesting has happend, what I have done, written, learned, what articles I have enjoyed most etc.

This has been my second year in Norway and I am still very much enjoying it, there is a very active developer community organizing great conferences such as JavaZone (followed by an amazing trip to the nature a.k.a. SurvivalZone) and Smidig (Agile) where I have also presented, Scala-focused flatMap, many meetups etc.

Events & side jobs

Thanks to my company I had the opportunity to do some real consulting work, namely helping with a technical audit of an R&D department and co-organizing a TDD and refactoring workshop for the customer, and I have learned a lot from both of these.

The most exciting event was a week long educational stay with Ken Beck that has resulted in the most popular blog post ever of my company and myself, Programming Like Kent Beck. Another exciting event was the workshop BDD – Specification by Example by Gojko Adzic, simply the best workshop/course I have ever attended, with plenty of valuable content about how to build the right software. I also very much enjoyed preparing and presenting an introductory workshop into Clojure with my friends and collegues Lars and Ivar.

Read the rest of this entry »

Posted in General, Top links of month | Comments Off on My 2012 in Review

Blogging Stats of 2012

Posted by Jakub Holý on January 1, 2013

The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.

A summary:

19,000 people fit into the new Barclays Center to see Jay-Z perform. This blog was viewed about 130 000 times in 2012. If it were a concert at the Barclays Center, it would take about 7 sold-out performances for that many people to see it.

Click here to see the complete report.

Posted in General | Comments Off on Blogging Stats of 2012