The Holy Java

Building the right thing, building it right, fast

AWS CloudWatch Alarms Too Noisy Due To Ignoring Missing Data in Averages

Posted by Jakub Holý on March 31, 2015

I want to know when our app starts getting slower so I sat up an alarm on the Latency metric of our ELB. According to the AWS Console, “This alarm will trigger when the blue line [average latency over the period of 15 min] goes above the red line [2 sec] for a duration of 45 minutes.” (I.e. it triggers if Latency > 2 for 3 consecutive period(s).) This is exactly what I need – except that it is a lie.

This night I got 8 alarm/ok notifications even though the average latency has never been over 2 sec for 45 minutes. The problem is that CloudWatch ignores null/missing data. So if you have a slow request at 3am and no other request comes until 4am, it will look at [slow, null, null, null] and trigger the alarm.

So I want to configure it to treat null as 0 and preferably to ignore latency if it only affected a single user. But there is no way to do this in CloudWatch.

Solution: I will likely need to run my own job that will read the metrics and produce a normalized, reasonable metric – replacing null / missing data with 0 and weight the average latency by the number of users in the period.

Sorry, the comment form is closed at this time.

%d bloggers like this: