Posted by Jakub Holý on April 30, 2015
The first Continuous Delivery and DevOps Conference in Oslo is over. It was nice to see so many people interested in the topic. I would have preferred more practical talks of the “how we did it” type over the “why” type but it was OK, though next year I would prefer flatMap. Here are my highlights:
- Atmel is using a physical robot to plug and connect a particular configuration of circuit boards to test; your automated testing challenges cannot be greater than theirs!
- Continuous Delivery decreases the risk of outage and time-to-recovery while enabling faster innovation, correlates with higher profits; No efficiency improvement will outperform cycle time reduction
- Estimation pathologies; focus on value rather than costs
- Stop talking about requirements, they are fake; they’re just beliefs about what may add value to customers. Use hypothesis instead!
- Cisco: Most of the tools increasing productivity (and some innovation) were produced by engineers in their “spare” time; slack time is thus crucial
- How does Cisco grow professionalism : optimise for the 10% best, not the 10% weakest developers; slack time; make everything visible; encourage code reviews but avoid making them mandatory; see the slide
- CALMS: Culture, Automation, Lean, Measurement, Sharing. The pillars of devOps
- Cisco invested a lot in crafting their build system, tailored test frameworks, and emulators to be able to get quick and quality feedback – because it pays off
- “Make you own build system” says @olvemaudal at @CoDeOSL. IME this is inevitable for non-trivial projects, and a good investment.
- Unleash: Feature Toggles server and Java/Node client by FINN.no
- “They asked for a report while they actually need just a list of data, the result of a simple SQL query; have we listened to them, we would have wasted hours creating a report in the report framework with logos and all the crap.”
Posted in General | Tagged: continuous_deployment, DevOps | Comments Off on My Highlights from Continuous Delivery and DevOps Conference 2015
Posted by Jakub Holý on April 30, 2015
Posted in Tools | Tagged: nodejs, productivity | Comments Off on iTerm coprocess reporting result of (Mocha) tests run via nodemon
Posted by Jakub Holý on April 3, 2015
How to back up your precious files stored on the WD My Cloud NAS into S3 with the slow but low-cost storage class “Glacier”.
How does the backup work: duplicity does its job and uploads files to S3. The large data archives are recognized by S3 Lifecycle rules that we set up based on their prefix and moved to the Glacier storage class soon after upload. (It takes hours to restore something from Glacier but its cost is orders of magnitude lower than that of S3 itself). We leave metadata files in S3 so that duplicity can read them.
90% of this is based on http://www.x2q.net/2013/02/24/howto-backup-wd-mybook-live-to-amazon-s3-and-glacier/ and the WD build guide (http://community.wd.com/t5/WD-My-Cloud/GUIDE-Building-packages-for-the-new-firmware-someone-tried-it/m-p/770653#M18650 and the update at http://community.wd.com/t5/WD-My-Cloud/GUIDE-Building-packages-for-the-new-firmware-someone-tried-it/m-p/841385#M27799). Kudos to the authors!
You will need to:
- Build duplicity and its dependencies (since WD Debian v04 switched to page size of 64kB, all pre-built binaries are unusable)
- Configure S3 to move the data files to Glacier after 0 days
- Create your backup script – see
- Schedule to run incremental backups regularly via Cron
- Preferably test restore manually
Read the rest of this entry »
Posted in Tools | Tagged: backup | 4 Comments »
Posted by Jakub Holý on March 31, 2015
I want to know when our app starts getting slower so I sat up an alarm on the Latency metric of our ELB. According to the AWS Console, “This alarm will trigger when the blue line [average latency over the period of 15 min] goes above the red line [2 sec] for a duration of 45 minutes.” (I.e. it triggers if Latency > 2 for 3 consecutive period(s).) This is exactly what I need – except that it is a lie.
This night I got 8 alarm/ok notifications even though the average latency has never been over 2 sec for 45 minutes. The problem is that CloudWatch ignores null/missing data. So if you have a slow request at 3am and no other request comes until 4am, it will look at [slow, null, null, null] and trigger the alarm.
So I want to configure it to treat null as 0 and preferably to ignore latency if it only affected a single user. But there is no way to do this in CloudWatch.
Solution: I will likely need to run my own job that will read the metrics and produce a normalized, reasonable metric – replacing null / missing data with 0 and weight the average latency by the number of users in the period.
Posted in General, Tools | Tagged: aws, monitoring, ops | Comments Off on AWS CloudWatch Alarms Too Noisy Due To Ignoring Missing Data in Averages
Posted by Jakub Holý on March 27, 2015
One of the annoying things with Jest is that while it enables you to run only a single test by using
it.only, it does not report this in any noticeable way. Thus you can end up in the same situation as we did, not running many tests without knowing it. (Oh yeah, if we only did review the code properly …).
This git pre-commit hook will fail when you introduce
it.only into the code:
Posted by Jakub Holý on March 17, 2015
Our systems always depend on other systems and services and thus may and will be subject to failures – network glitches, dropped connections, load spikes, deadlocks, slow or crashed subsystems. We will explore how to create robust systems that can sustain blows from its users, interconnecting networks, and supposedly allied systems yet carry on as well as possible, recovering quickly – instead of aggreviating these difficulties and turning them into an extended outage and potentially substiantial financial loss. In systems not designed for robustness, even a minor and transient failure tends to cause a chain reaction of failures, spreading destruction far and wide. Here you will learn how to avoid that with a few crucial yet simple stability patterns and the main antipatterns to be aware of. Based primarily on the book Release It! and Hystrix. (Presented at Iterate winter conference 2015; re-posted from blog.iterate.no.)
Read the rest of this entry »
Posted in SW development | Tagged: failure, ops, patterns | Comments Off on There will be failures – On systems that live through difficulties instead of turning them into a catastrophy
Posted by Jakub Holý on March 11, 2015
Being used to the excellent REPL in Clojure(Script), I was surprised to find out that Node.js REPL is somewhat weak and that its support in Emacs is not actively maintained. I anyway managed to get a usable REPL with these three components:
- The Emacs nodejs-repl package (nearly 2 years old)
- J. David Smith’s nodejs-repl-eval.el to be able to send code to the REPL (binding
C-x C-e so that I can execute the current sexp/region)
- My own extension of nodejs-repl-eval.el that takes care of escaping JS constructs that the REPL interprets in a special way
Regarding #3: The problem with the Node.js REPL is that valid JS code does not always behave correctly in the REPL. This is because: 1)
_ is a special variable (the last result) while in code it is often used for the underscore/lodash library; 2) The REPL also interprets lines somewhat separately and tries to execute
<dot><name> as a REPL command, breaking chained calls that start on a new line. My solution uses some RegExp magic to turn
var _ = require("lodash"); // #1a conflicting use of _
_.chain([1,2]) // #1b conflicting use of _
.first() // #2 interpreted as non-existing REPL command '.first'
var __ = require("lodash"); // #1a Notice the doubled _
__.chain([1,2]). // #1b Notice the doubled _
first(). // #2 Notice the dot has moved to the previous line
when the code is being sent to the REPL
Posted in Languages, Tools | Tagged: emacs, nodejs, productivity | Comments Off on A Usable Node.js REPL for Emacs
Posted by Jakub Holý on February 18, 2015
Kent Beck in his Patterns Enhance Craft Step 3: A Few Good Solutions highlights an important fact about software development:
We encounter repeating configurations of forces/constraints that have only a handful of “solution families” and the optimal solution(s) depend on the relative weights of these constraints.
For example when deciding what error handling style we should choose when calling an unreliable rutine:
Depending on whether readability, reliability, automated analysis, performance, or future maintenance are most important you could reasonably choose any one of:
- Return value plus errno
- Exceptional value (e.g. Haskell’s Maybe)
- Success and failure callbacks
So there is no single perfect error handling style to rule them all.
Kent further explains that the forces shaping most design decisions are generated internal to the process of design, not by external constraints: whether we’re building a barn or an airport, the list of forces influencing the roofing decision is the same – snow, wind, etc. – but their relative strengths may be different. Internal forces in SW development include use of the same bits of logic repeatedly, code made for/by people, etc.. F.ex. the forces influencing naming a variable do not depend on what SW we are building but on its purpose, lifetime, etc. We encounter some configurations of these constraints again and again and a catalogue of design patterns representing the “solution families” mentioned above can guide us towards the most suitable solution for given weights.
When designing a solution, it is helpful to think in terms of these forces and their relative strengths. There is no single superior solution (a.k.a. silver bullet) as different configurations of forces and their weights might be best suited by radically different solutions. Keeping this on our minds might prevent design discussions from dengenerating into an argument.
Posted in SW development | Tagged: design, opinion | Comments Off on The Are No Silver Bullets: Which Error Handling Style to Pick For a Given Configuration of Constraints?
Posted by Jakub Holý on February 17, 2015
There is an important difference between running a script manually (
ssh machine; machine$ ./script.sh) and running it via ssh (
ssh machine < script.sh): in the letter case the connection will not close when the script finishes but will stay open until stdout/stderr are closed or a timeout occurs. In Jenkins it will therefore seem as if the script hangs.
So if your shell scripts starts any background job, make sure to redirect all its output to somewhere:
nohup some-background-task &> /dev/null # No space between & and > !
This has bitten me when trying to deploy an application from the Jenkins CI using SSH and a shell script.
Posted in Uncategorized | Comments Off on Fix Shell Script Run via SSH Hanging (Jenkins)
Posted by Jakub Holý on January 26, 2015
James O. Coplien has written in 2014 the thought-provoking essay Why Most Unit Testing is Waste and further elaborates the topic in his Segue. I love testing but I also value challenging my views to expand my understanding so it was a valuable read. When encountering something so controversial, it’s crucial to set aside one’s emotions and opinions and ask: “Provided that it is true, what in my world view might need questioning and updating?” Judge for yourself how well have I have managed it. (Note: This post is not intended as a full and impartial summary of his writing but rather a overveiw of what I may learn from it.)
Perhaps the most important lesson is this: Don’t blindly accept fads, myths, authorities and “established truths.” Question everything, collect experience, judge for yourself. As J. Coplien himself writes:
Be skeptical of yourself: measure, prove, retry. Be skeptical of me for heaven’s sake.
I am currently fond of unit testing so my mission is now to critically confront Coplien’s ideas and my own preconceptions with practical experience on my next projects.
I would suggest that the main thing you take away isn’t “minimize unit testing” but rather “value thinking, focus on system testing, employ code reviews and other QA measures.”
I’ll list my main take-aways first and go into detail later on:
Read the rest of this entry »
Posted in Testing | Tagged: opinion | 1 Comment »