The Holy Java

Building the right thing, building it right, fast

Posts Tagged ‘ops’

AWS CloudWatch Alarms Too Noisy Due To Ignoring Missing Data in Averages

Posted by Jakub Holý on March 31, 2015

I want to know when our app starts getting slower so I sat up an alarm on the Latency metric of our ELB. According to the AWS Console, “This alarm will trigger when the blue line [average latency over the period of 15 min] goes above the red line [2 sec] for a duration of 45 minutes.” (I.e. it triggers if Latency > 2 for 3 consecutive period(s).) This is exactly what I need – except that it is a lie.

This night I got 8 alarm/ok notifications even though the average latency has never been over 2 sec for 45 minutes. The problem is that CloudWatch ignores null/missing data. So if you have a slow request at 3am and no other request comes until 4am, it will look at [slow, null, null, null] and trigger the alarm.

So I want to configure it to treat null as 0 and preferably to ignore latency if it only affected a single user. But there is no way to do this in CloudWatch.

Solution: I will likely need to run my own job that will read the metrics and produce a normalized, reasonable metric – replacing null / missing data with 0 and weight the average latency by the number of users in the period.

Posted in General, Tools | Tagged: , , | Leave a Comment »

There will be failures – On systems that live through difficulties instead of turning them into a catastrophy

Posted by Jakub Holý on March 17, 2015

Our systems always depend on other systems and services and thus may and will be subject to failures – network glitches, dropped connections, load spikes, deadlocks, slow or crashed subsystems. We will explore how to create robust systems that can sustain blows from its users, interconnecting networks, and supposedly allied systems yet carry on as well as possible, recovering quickly – instead of aggreviating these difficulties and turning them into an extended outage and potentially substiantial financial loss. In systems not designed for robustness, even a minor and transient failure tends to cause a chain reaction of failures, spreading destruction far and wide. Here you will learn how to avoid that with a few crucial yet simple stability patterns and the main antipatterns to be aware of. Based primarily on the book Release It! and Hystrix. (Presented at Iterate winter conference 2015; re-posted from blog.iterate.no.)

Read the rest of this entry »

Posted in SW development | Tagged: , , | Comments Off on There will be failures – On systems that live through difficulties instead of turning them into a catastrophy

Escaping the Zabbix UI pain: How to create a combined graph for a number of hosts using the Zabbix API

Posted by Jakub Holý on March 21, 2013

This post will answer two questions:

  • How to display the same item, f.ex. Processor load, for a number of hosts on the same graph
  • How to avoid getting crazy from all the slow clicking in the Zabbix UI by using its API

I will indicate how it could be done with plain HTTP POST and then show a solution using the Python library for accessing the Zabix API.

The problem we want to solve is to create a graph that plots the same item for a number of hosts that all are from the same Host group but not all hosts in the group should be included.

Read the rest of this entry »

Posted in General | Tagged: , , | Comments Off on Escaping the Zabbix UI pain: How to create a combined graph for a number of hosts using the Zabbix API

Using Java as Native Linux Apps – Calling C, Daemonization, Packaging, CLI (Brian McCallister)

Posted by Jakub Holý on September 25, 2012

This is a summary of the excellent JavaZone 2012 talk Going Native (vimeo) by Brian McCallister. Content: Using native libraries in Java and packaging them with Java apps, daemonization, trully executable JARs, powerful CLI, creating manpages, packaging natively as deb/rpm.

1. Using Native Libs in Java

Calling Native Libs

Calling native libraries such as C ones was hard and ugly with JNI but is very simple and nice with JNA (GPL) and JNR (Apache/LGPL)
Read the rest of this entry »

Posted in Languages | Tagged: , , , , , | Comments Off on Using Java as Native Linux Apps – Calling C, Daemonization, Packaging, CLI (Brian McCallister)

Enabling JMX Monitoring for Hadoop And Hive

Posted by Jakub Holý on September 21, 2012

Hadoop’s NameNode and JobTracker expose interesting metrics and statistics over the JMX. Hive seems not to expose anything intersting but it still might be useful to monitor its JVM or do simpler profiling/sampling on it. Let’s see how to enable JMX and how to access it securely, over SSH.

Read the rest of this entry »

Posted in Tools | Tagged: , , , , | 2 Comments »

VisualVM: Monitoring Remote JVM Over SSH (JMX Or Not)

Posted by Jakub Holý on September 21, 2012

(Disclaimer: Based on personal experience and little research, the information might be incomplete.)

VisualVM is a great tool for monitoring JVM (5.0+) regarding memory usage, threads, GC, MBeans etc. Let’s see how to use it over SSH to monitor (or even profile, using its sampler) a remote JVM either with JMX or without it.

This post is based on Sun JVM 1.6 running on Ubuntu 10 and VisualVM 1.3.3.

Read the rest of this entry »

Posted in General, Languages, Tools | Tagged: , , , | 1 Comment »

Zabbix: Fixing Active Checks to Work With Zabbix Proxy

Posted by Jakub Holý on August 8, 2012

We’ve recently changed our Zabbix 1.8.1 setup to include Zabbix Proxy, which broke all our active checks (f.ex. monitoring of log files). The solution seems to be having the proxy first, before the Zabbix Server, in the Zabbix Agent’s config parameter Server, i.e. “Server=<proxy ip>,<server ip>”.

Read the rest of this entry »

Posted in General, Tools | Tagged: , , | Comments Off on Zabbix: Fixing Active Checks to Work With Zabbix Proxy

How to Add MapRed-Only Node to Hadoop

Posted by Jakub Holý on August 8, 2012

I was surprised not to be able to google an answer to this so I want to record my findings here. To add (a.k.a. commision) a node to Hadoop cluster that should be used only for map-reduce tasks and not for storing data, you have multiple options:

  1. Do not start the datanode service on the node
  2. If you’ve configured Hadoop to allow only nodes on its whitelist files to connect to it then add it to the file pointed to by the property mapred.hosts but not to the file in dfs.hosts.
  3. Otherwise add the node to the DFS’ blacklist, i.e. file pointed to by the property dfs.hosts.exclude and execute hadoop dfsadmin -refreshNodes on the namenode to apply it.

Read the rest of this entry »

Posted in General, Tools | Tagged: , , | Comments Off on How to Add MapRed-Only Node to Hadoop

Notify on Errors in a Log File with Zabbix 1.8

Posted by Jakub Holý on July 3, 2012

Situation: You want to get notified when a log entry marked ERROR appears in a log file. You want the corresponding trigger to reset back to the OK state if there are no more errors for 10 minutes. (This post assumes certain familiarity with Zabbix UI.)

Read the rest of this entry »

Posted in Tools | Tagged: , , | Comments Off on Notify on Errors in a Log File with Zabbix 1.8

Testing Zabbix Trigger Expressions

Posted by Jakub Holý on July 2, 2012

When defining a Zabbix (1.8.2) trigger e.g. to inform you that there are errors in a log file, how do you verify that it is correct? As somebody recommended in a forum, you can use a Calculated Item with a similar expression (the syntax is little different from triggers). Contrary to triggers, the value of a calculated item is easy to see and the historical values are stored so you can check how it evolved. If your trigger expression is complex the you can create multiple calculated items, one for each subexpression.

Read the rest of this entry »

Posted in Testing, Tools | Tagged: , , | Comments Off on Testing Zabbix Trigger Expressions