The Holy Java

Building the right thing, building it right, fast

Java: Simulating various connection problems with Toxiproxy

Posted by Jakub Holý on November 26, 2018


Posted in Languages | Tagged: | Leave a Comment »

Clojure – comparison of gnuplot, Incanter, oz/vega-lite for plotting usage data

Posted by Jakub Holý on November 4, 2018

What is the best way to plot memory and CPU usage data (mainly) in Clojure? I will compare gnuplot, Incanter with JFreeChart, and vega-lite (via Oz). (Spoiler: I like Oz/vega-lite most but still use Incanter to prepare the data.)

The data looks like this:

;; sec.ns | memory | CPU %
1541052937.882172509 59m 0.0
1541052981.122419892 78m 58.0
1541052981.625876498 199m 85.9
1541053011.489811184 1.2g 101.8

The data has been produced by

The tools

Gnuplot 5

Gnuplot is the simplest, with a lot available out of the box. But it is also somewhat archaic and little flexible.

Read the rest of this entry »

Posted in Languages, [Dev]Ops | Tagged: , , | Comments Off on Clojure – comparison of gnuplot, Incanter, oz/vega-lite for plotting usage data

How I got fired and learned the importance of communication and play time

Posted by Jakub Holý on November 4, 2018

When I came to the office one late autumn morning in 2005, I have been shocked to find out that – without any warning signs whatsoever – I hd been fired. That day I have learned the importance of communication. Their criticism was justified but the thing is, nobody bothered to tell me anything during my 11 months in the company. I received exactly 0 feedback about my behaviour or work. The company ended up at court with its client – which both explains why they were stressed and was also caused by bad communication. So communication – even, or especially under stress – is really important. It must be open, transparent, and broad.

The funny thing is that I still do the things they fired me for.

Read the rest of this entry »

Posted in General | Tagged: , | 3 Comments »

How good monitoring saved our ass … again

Posted by Jakub Holý on November 1, 2018

You know how it goes – suddenly people complain your app does not work, your are getting plenty of timeouts or other errors in your error tracking tool, you find the backend app that is misbehaving and finally “fix” the problem by restarting it. Phew!

But why? What caused the downtime? A glitch an an upstream system? Sudden overload due to a spike in concurrent users? Trolls?

You know that it helps sometimes to zoom out, to get the right perspective. Here the perspective was 7 days:

It was enough to look at this chart with the right zoom to see at once that something happened on October 23rd that caused a significant change in the behavior of the application. Quick search and indeed, the change in CPU usage corresponds with a deployment. A quick revert to the previous version shortly confirmed the culprit. (It would have been even easier if we showed deployments on these charts.)

This is not the first time good monitoring saved us. A while ago we struggled regularly with the application becoming sluggish and had to restart it regularly. A graph of the Node.js even loop lag showed it increasing over time. Once it was on the same dashboard as Node’s heap usage, we could at once see that it correlated with increasing memory usage – indicating a memory leak. Few hours of experimenting and heap dump analysis later the problem was fixed.

So good monitoring is paramount.

Of course the trick is to know what to monitor and to display all relevant metrics in such a way that you can spot important relations. I am still working on improving that…

Posted in [Dev]Ops | Tagged: | Comments Off on How good monitoring saved our ass … again

Beware the performance cost of async_hooks (Node 8)

Posted by Jakub Holý on November 1, 2018

I was excited about async_hooks having finally landed in Node.js 8, as it would enable me to share important troubleshooting information with all code involved in handling a particular request. However it turned out to have terrible impact of our CPU usage (YMMV):

This was quite extreme and is likely related to the way how our application works and uses Promises. Do your own testing to measure the actual impact in your app.

However I am not the only one who has seen some performance hit from async_hooks – see, in particular:

Here the results of running the Promise micro benchmarks with and without async_hooks enabled:

Benchmark Node 8.9.4 Node 9.4.0
Bluebird-doxbee (regular) 226 ms 189 ms
Bluebird-doxbee (init hook) 383 ms 341 ms
Bluebird-doxbee (all hooks) 440 ms 411 ms
Bluebird-parallel (regular) 924 ms 696 ms
Bluebird-parallel (init hook) 1380 ms 1050 ms
Bluebird-parallel (all hooks) 1488 ms 1220 ms
Wikipedia (regular) 993 ms 804 ms
Wikipedia (init hook) 2025 ms 1893 ms
Wikipedia (all hooks) 2109 ms 2124 ms

To confirm the impact of async_hook on our app, I have performed 3 performance tests:

CPU usage without async_hooks (Node 8)

It is difficult to see but the mean CPU usage is perhaps around 60% here.

CPU usage with “no-op” async_hooks (Node 8)

Here the CPU jumped to 100%.

CPU usage with “no-op” async_hooks (Node 11)

The same as above, but using Node 11 for comparison. I recorded it for just a few minutes but the CPU usage is still around 100%:

The code

This is the relevant code:

Posted in Languages | Tagged: , | Comments Off on Beware the performance cost of async_hooks (Node 8)

Monitoring process memory/CPU usage with top and plotting it with gnuplot

Posted by Jakub Holý on October 17, 2018


If you want to monitor the memory and CPU usage of a particular Linux process for a few minutes, perhaps during a performance test, you can capture the data with top and plot them with gnuplot. Here is how:

Read the rest of this entry »

Posted in [Dev]Ops | Tagged: , | Comments Off on Monitoring process memory/CPU usage with top and plotting it with gnuplot

Troubleshooting Received fatal alert: handshake_failure

Posted by Jakub Holý on October 5, 2018

Re-published from the Telia Tech Blog.

The infamous Java exception Received fatal alert: handshake_failure is hardly understandable to a mere mortal. What it wants to say is, most likely, something like this:

Sorry, none of the cryptographic protocols/versions and cipher suites is accepted both by the JVM and the server.

For instance the server requires a higher version of TLS than the (old) JVM supports or it requires stronger cipher suites than the JVM knows. You will now learn how to find out what is the case.

We will first find out what both the server and the JVM support and compare it to see where they disagree. Feel free to just skim through the outputs and return to them later after they were explained.

What does the server support?

We will use nmap for that (brew install nmap on OSX):

map --script ssl-enum-ciphers -p 443
Starting Nmap 7.70 ( ) at 2018-10-05 00:54 CEST
Nmap scan report for (
Host is up (0.031s latency).

443/tcp open https
| ssl-enum-ciphers:
| TLSv1.2:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (secp256r1) - A
| TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (secp256r1) - A
| TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 (secp256r1) - A
| TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (secp256r1) - A
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 (secp256r1) - A
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A
| compressors:
| cipher preference: server
|_ least strength: A

Here we see that the server only supports TLS version 1.2 (ssl-enum-ciphers: TLSv1.2:) and the listed ciphers, such as TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA.

What does the JVM have on offer?

Now we will find out what the JVM supports (I did that through Clojure but you could have just as well used Java directly; notice the property):

sh $ env -i java java -jar clojure-1.8.0.jar
Clojure 1.8.0
user=> (.connect (.openConnection ( "")))
;; ...
done seeding SecureRandom
Ignoring unavailable cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
Ignoring unavailable cipher suite: TLS_DHE_RSA_WITH_AES_256_CBC_SHA
Ignoring unavailable cipher suite: TLS_ECDH_RSA_WITH_AES_256_CBC_SHA
Ignoring unsupported cipher suite: TLS_DHE_DSS_WITH_AES_128_CBC_SHA256
Ignoring unsupported cipher suite: TLS_DHE_DSS_WITH_AES_256_CBC_SHA256
Ignoring unsupported cipher suite: TLS_DHE_RSA_WITH_AES_128_CBC_SHA256
Ignoring unsupported cipher suite: TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256
Ignoring unsupported cipher suite: TLS_DHE_RSA_WITH_AES_256_CBC_SHA256
Ignoring unsupported cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
Ignoring unsupported cipher suite: TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384
Ignoring unsupported cipher suite: TLS_RSA_WITH_AES_256_CBC_SHA256
Ignoring unavailable cipher suite: TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
Ignoring unsupported cipher suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
Ignoring unsupported cipher suite: TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384
Ignoring unavailable cipher suite: TLS_DHE_DSS_WITH_AES_256_CBC_SHA
Ignoring unsupported cipher suite: TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384
Ignoring unsupported cipher suite: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
Ignoring unsupported cipher suite: TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256
Ignoring unavailable cipher suite: TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA
Ignoring unavailable cipher suite: TLS_RSA_WITH_AES_256_CBC_SHA
Ignoring unsupported cipher suite: TLS_RSA_WITH_AES_128_CBC_SHA256
Allow unsafe renegotiation: false
Allow legacy hello messages: true
Is initial handshake: true
Is secure renegotiation: false
main, setSoTimeout(0) called
%% No cached client session
*** ClientHello, TLSv1
RandomCookie: GMT: 1521850374 bytes = { 121, 217, 101, 186, 111, 183, 47, 46, 159, 230, 139, 103, 7, 181, 250, 172, 113, 121, 4, 55, 122, 148, 111, 82, 87, 170, 70, 10 }
Session ID: {}
Compression Methods: { 0 }
Extension elliptic_curves, curve names: {secp256r1, sect163k1, sect163r2, secp192r1, secp224r1, sect233k1, sect233r1, sect283k1, sect283r1, secp384r1, sect409k1, sect409r1, secp521r1, sect571k1, sect571r1, secp160k1, secp160r1, secp160r2, sect163r1, secp192k1, sect193r1, sect193r2, secp224k1, sect239k1, secp256k1}
Extension ec_point_formats, formats: [uncompressed]
Extension server_name, server_name: [host_name:]
main, WRITE: TLSv1 Handshake, length = 175
main, READ: TLSv1 Alert, length = 2
main, RECV TLSv1 ALERT: fatal, handshake_failure
main, called closeSocket()
main, handling exception: Received fatal alert: handshake_failure
SSLHandshakeException Received fatal alert: handshake_failure (

Here we see that the JVM uses TLS version 1 (see *** ClientHello, TLSv1) and supports the listed Cipher Suites, including TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA.

What’s wrong?

Here we see that the server and JVM share exactly one cipher suite, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA. But they fail to agree on the TLS version, since the server requires v1.2 while the JVM only offers v1.

The solution

You can either configure the server to support a cipher suite and protocol version that the JVM has or teach JVM to use what the server wants. In my cases that was resolved by running java with -Dhttps.protocols=TLSv1.2 (alternatively, you could add all of SSLv3,TLSv1,TLSv1.1,TLSv1.2) as recommended by π at StackOverflow.


The troubleshooting technique comes from the article “SSLHandshakeException: Received fatal alert: handshake_failure due to no overlap in cipher suite
” by Atlassian. The observation that the server and JVM disagreed on the TLS version comes from my good colleague Neil.

Posted in General, Languages | Tagged: , , | Comments Off on Troubleshooting Received fatal alert: handshake_failure

Why we love AWS Beanstalk but are leaving it anyway

Posted by Jakub Holý on March 14, 2018

Cross-posted from Telia’s Tech Blog.

We have had our mission-critical webapp running on AWS Elastic Beanstalk for three years and have been extremely happy with it. However we have now outgrown it and move to a manually managed infrastructure and CodeDeploy.

AWS Beanstalk provides you with lot of bang for the buck and enables you to get up and running in no time:

  • Simple, no-downtime deployment and automatic roll-back based on user-provided health-check (either one subset of nodes at a time or blue-green deployment)
  • Autoscaling
  • Managed updates – security fixes and other improvements installed automatically
  • Built-in HTTP Proxy with caching in front of your application
  • Monitoring dashboard with alerting and access to logs without the need for SSH
  • A list of past versions & ability to roll-back
  • Support for many runtimes (Java, Node.js, Docker to name just a few)

So if you need a solid, state-of-the-art infrastructure for a web-scale application and you don’t have lot of time and/or skill to build one on AWS on your own, I absolutely recommend Beanstalk.

Read the rest of this entry »

Posted in [Dev]Ops | Tagged: , | 2 Comments »

Pains with Terraform (perhaps use Sceptre next time?)

Posted by Jakub Holý on March 14, 2018

Cross-posted from Telia’s Tech Blog

We use Amazon Web Services (AWS) heavily and are in the process of migrating towards infrastructure-as-code, i.e. creating a textual description of the desired infrastructure in a Domain-Specific Language and letting the tool create and update the infrastructure.

We are lucky enough to have some of the leading Terraform experts in our organisation so they lay out the path and we follow it. We are at an initial stage and everything is thus “work in progress” and far from perfect, therefore it is important to judge leniently. Yet I think I have gain enough experience trying to apply Terraform both now and in the past to speak about some of the (current?) limitations and disadvantages and to consider alternatives.

Read the rest of this entry »

Posted in [Dev]Ops | Tagged: , | Comments Off on Pains with Terraform (perhaps use Sceptre next time?)

How to patch Travis CI’s deployment tool for your needs

Posted by Jakub Holý on January 9, 2018

Travis CI is a pretty good software-as-a-service Continuous Integration server. It can deploy to many targets, including AWS BeanStalk, S3, and CodeDeploy.

However it might happen that the deploy tool (dpl) has a missing feature or doesn’t do exactly what you need. Fortunately it is easy to fix and run a modified version of the tool, and I will show you how to do that.

Read the rest of this entry »

Posted in Tools | Comments Off on How to patch Travis CI’s deployment tool for your needs