The Holy Java

Building the right thing, building it right, fast

Archive for March, 2012

Most interesting links of Mars ’12

Posted by Jakub Holý on March 31, 2012

Recommended Readings

  • ThoughtWorks Technology Radar 3/2012 – including apps with embedded servlet containers (assess), health check pages for webapp monitoring, testing at the appropriate level (adopt), JavaScript micro-framewors (trial, see Microjs.com), Gradle over Maven (e.g. thanks to flexibility), OpenSocial for data & content sharing between (enterprise) apps (assess), Clojure (before in asses) and CoffeeScript on trial (Scala very close to adopt), JavaScript as a 1st class language (adopt), single-threaded servers with aync I/O (Node.js, Webbit for Java [http/websocket], …; assess).
  • Jez Humble: Four Principles of Low-Risk Software Releases – how to make your releases safer by making them incremental (versioned artifacts instead of overwritting, expand & contract DB scripts, versioned APIs, releasing to a subset of customers first), separating software deployment from releasing it so that end-users can use it (=> you can do smoke tests, canary releasing, dark launching [feature in place but not visible to users, already doing something]; includes feature toggles [toggle on only for somebody, switch off new buggy feature, …]), delivering features in smaller batches (=> more frequently, smaller risk of any individual release thanks to less stuff and easier roll-back/forward), and optimizing for resiliance (=> ability to provision a running production system to a known good state in predictable time – crucial when stuff fails).
  • The Game of Distributed Systems Programming. Which Level Are You? (via Kent Beck) – we start with a naive approach to distributed systems, treating them as just a little different local systems, then (painfully) come to understand the fallacies of distributed programming and start to program explicitely for the distributed environment leveraging asynchronous messaging and (often functional) languages with good support for concurrency and distribution. We suffer by random, subtle, non-deterministic defects and try to separate and restrict non-determinism by becoming purely functional … . Much recommended to anybody dealing with distributed systems (i.e. everybody, nowadays). The discussion is worth reading as well.
  • Shapes Don’t Draw – thought-provoking criticism of inappropriate use of OOP, which leads to bad and inflexible code. Simplification is OK as long as the domain is equally simple – but in the real world shapes do not draw themselves. (And Trades don’t decide their price and certainly shouldn’t reference services and a database.)
  • Capability Im-Maturity Model (via Markus Krüger) – everybody knows CMMI, but it’s useful to know also the negative directions an organization can develop in. Defined by Capt. Tom Schorsch in 1996, building on Anthony Finkelstein’s paper A Software Process Immaturity Model.
  • Cynefin: A Leader’s Framework for Decision Making – an introduction into the Cynefin cognitive framework – the key point is that we encounter 5 types of contexts differing by the predictability of effects and each of them requires a different management style, using the wrong one is a recipe for a disaster. Quote:

    The framework sorts the issues facing leaders into five contexts defined by the nature of the relationship between cause and effect. Four of these—simple, complicated, complex, and chaotic—require leaders to diagnose situations and to act in contextually appropriate ways. The fifth—disorder—applies when it is unclear which of the other four contexts is predominant.

  • Et spørsmål om kompleksitet (Norwegian). Key ideas mixed with my own: Command & control management in the traditional Ford way works very well – but only in stable domains with clear cause-and-effect relationships (i.e. the Simple context of Cynefin). But many tasks today have lot of uncertanity and complexity and deal with creating new, never before seen things. We try to lead projects as if they were automobile factories while often they are more like research – and researchers cannot plan when they will make a breakthrough. Most of the new development of IT systems falls into the Complex context of Cynefin – there is lot of uncertanity, no clear answers, we cannot forsee problems, and have to base our progress on empirical experience and leverage emergence (emergent design, ..).
  • The Economics of Developer Testing – a very interesting reflection on the cost and value of testing and what is enough tests. Tests cost to develop and maintain (and different tests cost differently, the more complex the more expensive). Not having tests costs too – usually quite a lot. To find the right ballance between tests and code and different types of tests we must be aware of their cost and benefits, both short & long term. Worth reading, good links. (Note: We often tend to underestimate the cost of not having good tests. Much more then you might think.)

Links to Keep

Quotes

Kent Beck answering a question about how much testing to do (highlighted by me):

I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence (I suspect this level of confidence is high compared to industry standards, but that could just be hubris). If I don’t typically make a kind of mistake (like setting the wrong variables in a constructor), I don’t test for it. I do tend to make sense of test errors, so I’m extra careful when I have logic with complicated conditionals. When coding on a team, I modify my strategy to carefully test code that we, collectively, tend to get wrong.

Different people will have different testing strategies based on this philosophy, but that seems reasonable to me given the immature state of understanding of how tests can best fit into the inner loop of coding. Ten or twenty years from now we’ll likely have a more universal theory of which tests to write, which tests not to write, and how to tell the difference. In the meantime, experimentation seems in order.

Posted in General, Testing, Top links of month | Tagged: , , , , , | Comments Off

Note To Self: What to Do When a Vagrant Machine Stops Working (Destroy or Up Failing)

Posted by Jakub Holý on March 24, 2012

Sometimes “vagrant destroy” fails with an exception from the depths of the virtualbox Ruby gem or vagrant up freezes for a long time only to fail with SSH connection failure message. Here are some tips how to solve such problems.

See also my Vagrant Notes.

Read the rest of this entry »

Posted in Tools | Tagged: , , , | Comments Off

Kent Beck: Best Practices for Software Design with Low Feature Latency and High Throughput

Posted by Jakub Holý on March 12, 2012

I was fortunate to attend Kent Beck’s lecture summarizing his experiences and thoughts regarding efficient software design. Traditionally there have been two schools of thought about design: Predictive design, trying to design everything upfront (and making lot of wrong decisions) and reactive design, where any design is only done if it is absolutely necessary for implementing a feature (thus developing often on top of an insufficient design). Kent tried hard to discover such a design method that really delivers on the promises of both while avoiding their failures. This method is based on evolving design frequently in small, safe steps and focusing on learning while following some key best practices. It doesn’t really matter what scope of design we are are speaking about, the method and principles are the same whether you’re redesigning a class or a complex system.

Read the rest of this entry »

Posted in General | Tagged: , , | 9 Comments »

Link: Benchmark and Scaling of Amazon RDS (MySQL)

Posted by Jakub Holý on March 1, 2012

Performance and scaling of the Amazon-managed MySQL, Relational Data Store (RDS):

Scaling options:

  • Horizontal scaling
    • Sharding (distribute data [tables or rows] among multiple RDS instances; Tumblr uses sharded MySQL and it worked well for them) – there is no explicit support so the applications have to handle it themselves, i.e. know which table/rows to read from which instance
    • Read-replicas: RDS supports set up of read-only replicas using MySQL’s own replication; the replicas are evidently only usable for reading and may contain little stale data
  • Vertical scaling (stronger EC2 instances) – there are interesting results from a benchmark of RDS with various instances/DB sizes (6/2011, complete report); key observations:
    • “With hardly any dependency on the database size, MySQL reaches its optimal throughput at around 64 concurrent users. Anything above that causes throughput degradation.”
    • “Throughput is improving as machines get stronger. However, there is a sweet-spot, a point where adding hardware doesn’t help performance. The sweet spot is around the XL machine, which reaches a [max] throughput of around 7000 tpm.” (transactions per minute => ~ 110 tx/sec)

Disclaimer: No banchmark proves anything generally applicable, it’s always necessary to run one’s own production load and measure that to see how in reality a DB performs for one’s actual needs.

Notes

  • The number of concurrent connections is by default derived from the memory, namely 150 for a small 1.5GB instance and 650 for a large 7.5GB instance. According to one expert it’s completely OK to set it to 1000 connections without regard to memory; MySQL should handle it.

Posted in General | Tagged: , , | Comments Off