The Holy Java

Building the right thing, building it right, fast

Posts Tagged ‘ops’

Book Review & Digest: Release It! Design and Deploy Production-Ready Software

Posted by Jakub Holý on July 22, 2015

By Michael T. Nygard, 2007, ISBN: 978-0-9787-3921-8

My digest and review of the book.

Review

Of the books I have read, Release It! is the one I would require all “senior” developers to read (together with something like Architecting Enterprise Solutions: Patterns for High-Capability Internet-based Systems). Especially the first part on stability with its patterns and anti-patterns is a must read. Without knowing and applying them, we create systems that react to problems like a dry savannah to a burning match. I found also to next to last chapter, #17 Transparency, very valuable, especially the metrics and design of the OpsDB and observation practices.

One thing I have left out of the digest which is really worth reading are the war stories that introduce each section, they are really interesting, inspiring, and educational.

Extra Links

Stability

Stability x longevity bugs

Stability antipatterns

Integration points

Integration point = call to a DB, WS, … . Stability risk #1.

Read the rest of this entry »

Posted in [Dev]Ops | Tagged: , , , | Leave a Comment »

AWS API: Proper syntax for filtering by tag name and value (e.g. describeInstances)

Posted by Jakub Holý on June 11, 2015

It took me quite a while to figure out the right syntax for filtering instances by tag name and value in the AWS EC2 API’s describeInstances.

The documentation is not exactly crystal-clear to me:

  • tag:key=value – The key/value combination of a tag assigned to the resource, where tag:key is the tag’s key.

Anyway, here is the proper syntax, provided we are interested in the tag elasticbeanstalk:environment-name:

    var params = {
        Filters: [
            {
                Name: 'tag:elasticbeanstalk:environment-name',
                Values: ['mySuperApp']
            }
        ]
    };
    ec2.describeInstances(params);

So the name of the tag is embedded in the Name part and not, as I initially understood,
{ Name: 'tag', Values: ['elasticbeanstalk:environment-name=mySuperApp'] }

Credit: garnaat.

Posted in [Dev]Ops | Tagged: , | Comments Off on AWS API: Proper syntax for filtering by tag name and value (e.g. describeInstances)

Mounting an EBS volume to Docker on AWS Elastic Beanstal

Posted by Jakub Holý on June 2, 2015

Mounting an EBS volume to a Docker instance running on Amazon Elastic Beanstalk (EB) is surprisingly tricky. The good news is that it is possible.

I will describe how to automatically create and mount a new EBS volume (optionally based on a snapshot). If you would prefer to mount a specific, existing EBS volume, you should check out leg100’s docker-ebs-attach (using AWS API to mount the volume) that you can use either in a multi-container setup or just include the relevant parts in your own Dockerfile.

The problem with EBS volumes is that, if I am correct, a volume can only be mounted to a single EC2 instance – and thus doesn’t play well with EB’s autoscaling. That is why EB supports only creating and mounting a fresh volume for each instance.

Read the rest of this entry »

Posted in General | Tagged: , , , | Comments Off on Mounting an EBS volume to Docker on AWS Elastic Beanstal

All-in-one Docker with Grafana, InfluxDB, and cloudwatch-to-graphite for AWS/Beanstalk monitoring

Posted by Jakub Holý on May 7, 2015

I have derived the Docker container docker-grafana-influxdb-cloudwatch that bundles Grafana dashboards, InfluxDB for metrics storage, and runs cloudwatch-to-graphite as a cron job to fetch selected metrics from AWS CloudWatch and feed them into the InfluxDB using its Graphite input plugin. It is configured so that you can run it in AWS Elastic Beanstalk (the main problem being that only a single port can be exposed – I therefore use Nginx to expose the InfluxDB API needed by Grafana at :80/db/).

Read the rest of this entry »

Posted in General | Tagged: , , , | Comments Off on All-in-one Docker with Grafana, InfluxDB, and cloudwatch-to-graphite for AWS/Beanstalk monitoring

AWS CloudWatch Alarms Too Noisy Due To Ignoring Missing Data in Averages

Posted by Jakub Holý on March 31, 2015

I want to know when our app starts getting slower so I sat up an alarm on the Latency metric of our ELB. According to the AWS Console, “This alarm will trigger when the blue line [average latency over the period of 15 min] goes above the red line [2 sec] for a duration of 45 minutes.” (I.e. it triggers if Latency > 2 for 3 consecutive period(s).) This is exactly what I need – except that it is a lie.

This night I got 8 alarm/ok notifications even though the average latency has never been over 2 sec for 45 minutes. The problem is that CloudWatch ignores null/missing data. So if you have a slow request at 3am and no other request comes until 4am, it will look at [slow, null, null, null] and trigger the alarm.

So I want to configure it to treat null as 0 and preferably to ignore latency if it only affected a single user. But there is no way to do this in CloudWatch.

Solution: I will likely need to run my own job that will read the metrics and produce a normalized, reasonable metric – replacing null / missing data with 0 and weight the average latency by the number of users in the period.

Posted in General, Tools | Tagged: , , | Comments Off on AWS CloudWatch Alarms Too Noisy Due To Ignoring Missing Data in Averages

There will be failures – On systems that live through difficulties instead of turning them into a catastrophy

Posted by Jakub Holý on March 17, 2015

Our systems always depend on other systems and services and thus may and will be subject to failures – network glitches, dropped connections, load spikes, deadlocks, slow or crashed subsystems. We will explore how to create robust systems that can sustain blows from its users, interconnecting networks, and supposedly allied systems yet carry on as well as possible, recovering quickly – instead of aggreviating these difficulties and turning them into an extended outage and potentially substiantial financial loss. In systems not designed for robustness, even a minor and transient failure tends to cause a chain reaction of failures, spreading destruction far and wide. Here you will learn how to avoid that with a few crucial yet simple stability patterns and the main antipatterns to be aware of. Based primarily on the book Release It! and Hystrix. (Presented at Iterate winter conference 2015; re-posted from blog.iterate.no.)

Read the rest of this entry »

Posted in SW development | Tagged: , , | Comments Off on There will be failures – On systems that live through difficulties instead of turning them into a catastrophy

Escaping the Zabbix UI pain: How to create a combined graph for a number of hosts using the Zabbix API

Posted by Jakub Holý on March 21, 2013

This post will answer two questions:

  • How to display the same item, f.ex. Processor load, for a number of hosts on the same graph
  • How to avoid getting crazy from all the slow clicking in the Zabbix UI by using its API

I will indicate how it could be done with plain HTTP POST and then show a solution using the Python library for accessing the Zabix API.

The problem we want to solve is to create a graph that plots the same item for a number of hosts that all are from the same Host group but not all hosts in the group should be included.

Read the rest of this entry »

Posted in General | Tagged: , , | Comments Off on Escaping the Zabbix UI pain: How to create a combined graph for a number of hosts using the Zabbix API

Using Java as Native Linux Apps – Calling C, Daemonization, Packaging, CLI (Brian McCallister)

Posted by Jakub Holý on September 25, 2012

This is a summary of the excellent JavaZone 2012 talk Going Native (vimeo) by Brian McCallister. Content: Using native libraries in Java and packaging them with Java apps, daemonization, trully executable JARs, powerful CLI, creating manpages, packaging natively as deb/rpm.

1. Using Native Libs in Java

Calling Native Libs

Calling native libraries such as C ones was hard and ugly with JNI but is very simple and nice with JNA (GPL) and JNR (Apache/LGPL)
Read the rest of this entry »

Posted in Languages | Tagged: , , , , , | Comments Off on Using Java as Native Linux Apps – Calling C, Daemonization, Packaging, CLI (Brian McCallister)

Enabling JMX Monitoring for Hadoop And Hive

Posted by Jakub Holý on September 21, 2012

Hadoop’s NameNode and JobTracker expose interesting metrics and statistics over the JMX. Hive seems not to expose anything intersting but it still might be useful to monitor its JVM or do simpler profiling/sampling on it. Let’s see how to enable JMX and how to access it securely, over SSH.

Read the rest of this entry »

Posted in Tools | Tagged: , , , , | 2 Comments »

VisualVM: Monitoring Remote JVM Over SSH (JMX Or Not)

Posted by Jakub Holý on September 21, 2012

(Disclaimer: Based on personal experience and little research, the information might be incomplete.)

VisualVM is a great tool for monitoring JVM (5.0+) regarding memory usage, threads, GC, MBeans etc. Let’s see how to use it over SSH to monitor (or even profile, using its sampler) a remote JVM either with JMX or without it.

This post is based on Sun JVM 1.6 running on Ubuntu 10 and VisualVM 1.3.3.

Read the rest of this entry »

Posted in General, Languages, Tools | Tagged: , , , | 1 Comment »