Posted by Jakub Holý on May 12, 2012
Have you ever worked with an application where you had to copy data from one object to another and another and so on before you actually could do something with it? Have you ever written code to convert data from XML to a DTO to a Business Object to a JDBC Statement? Again and again for each of the different data types being processed? Then you have encountered an all too common antipattern of many “enterprise” (read “overdesigned”) applications, which we could call The Endless Mapping Death March. Let’s look at an application suffering from this antipattern and how to rewrite it in a much nicer, leaner and easier to maintain form.
The application, The World of Thrilling Fashion (or WTF for short) collects and stores information about newly designed dresses and makes it available via a REST API. Every poor dress has to go through the following conversions before reaching a devoted fashion fan:
- Parsing from XML into a XML-specific XDress object
- Processing and conversion to an application-specific Dress object
- Conversion to a MongoDB’s DBObject so that it can be stored in the DB (as JSON)
- Conversion from the DBObject back to the Dress object
- Conversion from Dress to a JSON string
Uff, that’s lot of work! Each of the conversions is coded manually and if we want to extend WTF to provide information also about trendy shoes, we will need to code all of them again. (Plus couple of methods in our MongoDAO, such as getAllShoes and storeShoes.) But we can do much better than that!
Read the rest of this entry »
Posted in General, Java | Tagged: badcode, CleanCode, design, opinion | 1 Comment »
Posted by Jakub Holý on May 9, 2012
In Simple Made Easy argues Rich Hickey that mixing orthogonal concerns introduces unnecessary complexity and that we should keep them separate. This mixing sometimes occurs on such a basic level that we believe that there is no other way to do it, an example being the interleaving of polymorphism and hierarchical namespacing represented by OO class hierarchies. Taking those “complected” concerns apart and dealing with them separately yields cleaner, simpler solutions and sometimes also more powerful ones because you are free to combine them as you need and not as the author decided.
Read the rest of this entry »
Posted in General | Tagged: clojure, design, opinion | Leave a Comment »
Posted by Jakub Holý on May 5, 2012
For a recent project I needed to be able to start on-demand clusters of machines in Amazon EC2. We needed each instance in a cluster to allow SSH and sudo access for all team members and to install and configure the software appropriate for that cluster (“database” node or “testclient” node).
You can see the results of my effort using EC2 command-line tools, Puppet etc. at the project’s Puppet GitHub repository, the setup is described in detail in its README.
(Tips for improvements are welcome. And not, Star Cluster isn’t what we needed.)
Posted in General, Tools | Tagged: aws, cloud, DevOps, ec2 | Leave a Comment »
Posted by Jakub Holý on April 30, 2012
Recommended Readings
- V. Duarte: Story Points Considered Harmful – Or why the future of estimation is really in our past… (also as 1h video) – thoughtful and data-backed claim that there is a much cheaper way for estimating work throughput than estimating each story in story points (SP) and that is simply counting the stories. Even though their sizes differ, over (not that much) longer periods, where it really matters, these differences will even out. The author argues that estimating in number of stories provides the same reliability and benefits as SP and is much easier. (Keep in mind that estimation is just an attempt at predicting the future and humans are proved to be terrible at doing that; why to pretend that we can do it?) I’d recommand this to anybody doing Scrum and similar.
- M. Fowler: Test Coverage – it’s obvious that increasing test coverage for the sake of test coverage it’s a nonsense but some people still need to be reminded of it
. Fowler explains what the real benefit of test coverage measurements is and how to use it for good instead of for evil.
- Brian Marick: How to Misuse Code Coverage (pdf) – cited a lot by Fowler in his article, this is really a good paper. Marick has participated in the development of several code coverage tools and understands well their limitations. One of the key points is that code coverage tools can discover only one class of test weakness (not testing some paths through your code) but cannot discover that you are missing some code you should have (e.g. when you check only for two of three possible return values). Thus the code coverage metric tells you “this code isn’t well tested, are you sure you don’t to look more into it”? It’s crucial not to write tests so as to increase the code coverage; look at the code and improve the test without any regard for coverage. You may thus decrease the likeliness of both the class of problems.
- A Year with MongoDB – Kiip has found out that Mongo isn’t the best choice for them (having 240GB, 500+ operations/s, 85M docs and their specific usage of the store) and migrated to the combination of Riak (key-value store) and PostgreSQL. Some of the issues they hit are slow counts and limit/offset queries due to using non-counting B-trees for indexing, memory management that could be more intelligent and tuned for the use to make sure the data needed is indeed in RAM, no built-in support for compressing key names (their size adds up as they’re repeated in each document; you’ve to compress them [user -> u etc.] in the client if you want to), limited concurrency due to process-wide write lock (which becomes a problem if the write’s aren’t short enough w.r.t. number of ops/s, e.g. because data isn’t in RAM and/or the query is complicated), safe settings (waiting for a write to finish, …) off by default, offline-only table compaction (w/o it the disk usage grows unbounded). The lessons learnt for me: Know your storage, its weaknesses and intended way of usage, and make sure it matches your needs.
- Rudolf Winestock: The Lisp Curse – Lisp’s expressive power is actually a cause of its lack of momentum because it’s so easy to implement anything that people have no need to join forces and thus there are many half-baked (“works-for-me”) solutions for anything – but no complete, generally accepted one. An interesting essay. “Lisp is so powerful that problems which are technical issues in other programming languages are social issues in Lisp.”
- Understanding JDBC Internals & Timeout Configuration – the article itself could have been written better but it conveys the important information that configuring timeouts for JDBC isn’t trivial because they need to be set correctly at different levels and without a socket timeout set in a driver-specific way it can hang forever if the DB cannot be reached due to network/system failure
- Circos: An Amazing Tool for Visualizing Big Data – this article is interesting primarily for its combination of Google Analytics API, Neo4J and an unusual data visualization with circular graphs
Tools
- CRaSH: Extensible shell for the JVM (docs) – a shell that you can embedd into a web server as a WAR, run standalone or attach to a running JVM, connect to it via SSH or Telnet, and use it to execute commands against the JVM. Some commands: configure loggers, control threads, monitor the system (mem, threads, ..), connect/issue queries via JDBC. More commands can be written in Groovy. There is a whole set of commands for working with JCR. Pluggable authentication.
Clojure Corner
Posted in Databases/DB2, General, Testing, Tools, Top links of month | Tagged: agile, bigdata, nosql, Testing | Leave a Comment »
Posted by Jakub Holý on April 4, 2012
I needed a quick and simple way to enable some users to query a table and figured out that the easiest solution was to use an embedded, ligthweight HTTP server so that the users could type a URL in their browser and get the results. The question was, of course, which server is best for it. I’d like to summarize here the options I’ve discovered – including Gretty, Jetty, Restlet, Jersey and others – and their pros & cons together with complete examples for most of them. I’ve on purpose avoided various frameworks that might support this easily such as Grails because it didn’t feel really lightweight and I needed only a very simple, temporary application.
I used Groovy for its high productivity, especially regarding JDBC – with GSQL I needed only two lines to get the data from a DB in a user-friendly format.
My ideal solution would make it possible to start the server with support for HTTPS and authorization and declare handlers for URLs programatically, in a single file (Groovy script), in just few lines of code. (Very similar to the Gretty solution below + the security stuff.)
Read the rest of this entry »
Posted in j2ee, Java | Tagged: groovy, webapp, REST | 4 Comments »
Posted by Jakub Holý on April 2, 2012
If you use the Grape’s @Grab annotation to get dependencies for your Groovy scripts at runtime and their retrieval fails with the exception “General error during conversion: Error grabbing Grapes — [unresolved dependency: ...not found]” and a useless stack trace then you migth want to know that you can configure Ivy to log all the details of what is going on (what it is trying to download, where from, …), for example in the interactive groovysh shell:
groovy:000> org.apache.ivy.util.Message.setDefaultLogger(new org.apache.ivy.util.DefaultMessageLogger(org.apache.ivy.util.Message.MSG_DEBUG))
groovy:000> groovy.grape.Grape.grab(autoDownload: true, group: 'org.eclipse.jetty.orbit', module: 'javax.servlet', version: '3.0.0.v201112011016')
...
WARN: ==== ibiblio: tried
WARN: http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbi
...
You can likely also increase the log level by setting the system property ivy.message.logger.level to 4 (debug, see the Ivy Message class.)
(For the list of arguments that grab supports see GrapeIvy, namely the method createGrabRecord [btw, ext and type are ignored unless you also set classifier])
Posted in General, Java | Tagged: groovy | Leave a Comment »
Posted by Jakub Holý on March 31, 2012
Recommended Readings
- ThoughtWorks Technology Radar 3/2012 – including apps with embedded servlet containers (assess), health check pages for webapp monitoring, testing at the appropriate level (adopt), JavaScript micro-framewors (trial, see Microjs.com), Gradle over Maven (e.g. thanks to flexibility), OpenSocial for data & content sharing between (enterprise) apps (assess), Clojure (before in asses) and CoffeeScript on trial (Scala very close to adopt), JavaScript as a 1st class language (adopt), single-threaded servers with aync I/O (Node.js, Webbit for Java [http/websocket], …; assess).
- Jez Humble: Four Principles of Low-Risk Software Releases – how to make your releases safer by making them incremental (versioned artifacts instead of overwritting, expand & contract DB scripts, versioned APIs, releasing to a subset of customers first), separating software deployment from releasing it so that end-users can use it (=> you can do smoke tests, canary releasing, dark launching [feature in place but not visible to users, already doing something]; includes feature toggles [toggle on only for somebody, switch off new buggy feature, ...]), delivering features in smaller batches (=> more frequently, smaller risk of any individual release thanks to less stuff and easier roll-back/forward), and optimizing for resiliance (=> ability to provision a running production system to a known good state in predictable time – crucial when stuff fails).
- The Game of Distributed Systems Programming. Which Level Are You? (via Kent Beck) – we start with a naive approach to distributed systems, treating them as just a little different local systems, then (painfully) come to understand the fallacies of distributed programming and start to program explicitely for the distributed environment leveraging asynchronous messaging and (often functional) languages with good support for concurrency and distribution. We suffer by random, subtle, non-deterministic defects and try to separate and restrict non-determinism by becoming purely functional … . Much recommended to anybody dealing with distributed systems (i.e. everybody, nowadays). The discussion is worth reading as well.
- Shapes Don’t Draw – thought-provoking criticism of inappropriate use of OOP, which leads to bad and inflexible code. Simplification is OK as long as the domain is equally simple – but in the real world shapes do not draw themselves. (And Trades don’t decide their price and certainly shouldn’t reference services and a database.)
- Capability Im-Maturity Model (via Markus Krüger) – everybody knows CMMI, but it’s useful to know also the negative directions an organization can develop in. Defined by Capt. Tom Schorsch in 1996, building on Anthony Finkelstein’s paper A Software Process Immaturity Model.
- Cynefin: A Leader’s Framework for Decision Making – an introduction into the Cynefin cognitive framework – the key point is that we encounter 5 types of contexts differing by the predictability of effects and each of them requires a different management style, using the wrong one is a recipe for a disaster. Quote:
The framework sorts the issues facing leaders into five contexts defined by the nature of the relationship between cause and effect. Four of these—simple, complicated, complex, and chaotic—require leaders to diagnose situations and to act in contextually appropriate ways. The fifth—disorder—applies when it is unclear which of the other four contexts is predominant.
- Et spørsmål om kompleksitet (Norwegian). Key ideas mixed with my own: Command & control management in the traditional Ford way works very well – but only in stable domains with clear cause-and-effect relationships (i.e. the Simple context of Cynefin). But many tasks today have lot of uncertanity and complexity and deal with creating new, never before seen things. We try to lead projects as if they were automobile factories while often they are more like research – and researchers cannot plan when they will make a breakthrough. Most of the new development of IT systems falls into the Complex context of Cynefin – there is lot of uncertanity, no clear answers, we cannot forsee problems, and have to base our progress on empirical experience and leverage emergence (emergent design, ..).
- The Economics of Developer Testing – a very interesting reflection on the cost and value of testing and what is enough tests. Tests cost to develop and maintain (and different tests cost differently, the more complex the more expensive). Not having tests costs too – usually quite a lot. To find the right ballance between tests and code and different types of tests we must be aware of their cost and benefits, both short & long term. Worth reading, good links. (Note: We often tend to underestimate the cost of not having good tests. Much more then you might think.)
Links to Keep
Quotes
Kent Beck answering a question about how much testing to do (highlighted by me):
I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence (I suspect this level of confidence is high compared to industry standards, but that could just be hubris). If I don’t typically make a kind of mistake (like setting the wrong variables in a constructor), I don’t test for it. I do tend to make sense of test errors, so I’m extra careful when I have logic with complicated conditionals. When coding on a team, I modify my strategy to carefully test code that we, collectively, tend to get wrong.
Different people will have different testing strategies based on this philosophy, but that seems reasonable to me given the immature state of understanding of how tests can best fit into the inner loop of coding. Ten or twenty years from now we’ll likely have a more universal theory of which tests to write, which tests not to write, and how to tell the difference. In the meantime, experimentation seems in order.
Posted in General, Testing, Top links of month | Tagged: agile, cloud, design, management, Testing, trends | Leave a Comment »
Posted by Jakub Holý on March 24, 2012
Sometimes “vagrant destroy” fails with an exception from the depths of the virtualbox Ruby gem or vagrant up freezes for a long time only to fail with SSH connection failure message. Here are some tips how to solve such problems.
Read the rest of this entry »
Posted in Tools | Tagged: DevOps, error, troubleshooting, vagrant | Leave a Comment »
Posted by Jakub Holý on March 12, 2012
I was fortunate to attend Kent Beck’s lecture summarizing his experiences and thoughts regarding efficient software design. Traditionally there have been two schools of thought about design: Predictive design, trying to design everything upfront (and making lot of wrong decisions) and reactive design, where any design is only done if it is absolutely necessary for implementing a feature (thus developing often on top of an insufficient design). Kent tried hard to discover such a design method that really delivers on the promises of both while avoiding their failures. This method is based on evolving design frequently in small, safe steps and focusing on learning while following some key best practices. It doesn’t really matter what scope of design we are are speaking about, the method and principles are the same whether you’re redesigning a class or a complex system.
Read the rest of this entry »
Posted in General | Tagged: agile, design, software | 1 Comment »
Posted by Jakub Holý on March 1, 2012
Performance and scaling of the Amazon-managed MySQL, Relational Data Store (RDS):
Scaling options:
- Horizontal scaling
- Sharding (distribute data [tables or rows] among multiple RDS instances; Tumblr uses sharded MySQL and it worked well for them) – there is no explicit support so the applications have to handle it themselves, i.e. know which table/rows to read from which instance
- Read-replicas: RDS supports set up of read-only replicas using MySQL’s own replication; the replicas are evidently only usable for reading and may contain little stale data
- Vertical scaling (stronger EC2 instances) – there are interesting results from a benchmark of RDS with various instances/DB sizes (6/2011, complete report); key observations:
- “With hardly any dependency on the database size, MySQL reaches its optimal throughput at around 64 concurrent users. Anything above that causes throughput degradation.”
- “Throughput is improving as machines get stronger. However, there is a sweet-spot, a point where adding hardware doesn’t help performance. The sweet spot is around the XL machine, which reaches a [max] throughput of around 7000 tpm.” (transactions per minute => ~ 110 tx/sec)
Disclaimer: No banchmark proves anything generally applicable, it’s always necessary to run one’s own production load and measure that to see how in reality a DB performs for one’s actual needs.
Notes
- The number of concurrent connections is by default derived from the memory, namely 150 for a small 1.5GB instance and 650 for a large 7.5GB instance. According to one expert it’s completely OK to set it to 1000 connections without regard to memory; MySQL should handle it.
Posted in General | Tagged: aws, performance, rds | Leave a Comment »