You know how it goes – suddenly people complain your app does not work, your are getting plenty of timeouts or other errors in your error tracking tool, you find the backend app that is misbehaving and finally “fix” the problem by restarting it. Phew! But why? What caused the downtime? A glitch an anContinue reading “How good monitoring saved our ass … again”
Tag Archives: monitoring
Monitoring process memory/CPU usage with top and plotting it with gnuplot
If you want to monitor the memory and CPU usage of a particular Linux process for a few minutes, perhaps during a performance test, you can capture the data with top and plot them with gnuplot. Here is how:
AWS CloudWatch Alarms Too Noisy Due To Ignoring Missing Data in Averages
I want to know when our app starts getting slower so I sat up an alarm on the Latency metric of our ELB. According to the AWS Console, “This alarm will trigger when the blue line [average latency over the period of 15 min] goes above the red line [2 sec] for a duration ofContinue reading “AWS CloudWatch Alarms Too Noisy Due To Ignoring Missing Data in Averages”
Graphite Shows Metrics But No Data – Troubleshooting
My Graphite has all the metrics I expect but shows no data for them. Communication between my app and Graphite clearly works otherwise the metrics would not have appeared in the list but why is there no data? Update: Graphite data gotchas that got me (These gotchas explain why I did not see any data.)Continue reading “Graphite Shows Metrics But No Data – Troubleshooting”
Most interesting links of December ’13
Recommended Readings Society HBR: Want to Build Resilience? Kill the Complexity – a highly interesting, thought provoking article relevant both to technology in particular and the society in general; f.ex.: more security features are bad for they make us behave less safely (risk compensation) and are more fragile w.r.t. unexpected events. “Complexity is a clearContinue reading “Most interesting links of December ’13”
Enabling JMX Monitoring for Hadoop And Hive
Hadoop’s NameNode and JobTracker expose interesting metrics and statistics over the JMX. Hive seems not to expose anything intersting but it still might be useful to monitor its JVM or do simpler profiling/sampling on it. Let’s see how to enable JMX and how to access it securely, over SSH.
VisualVM: Monitoring Remote JVM Over SSH (JMX Or Not)
(Disclaimer: Based on personal experience and little research, the information might be incomplete.) VisualVM is a great tool for monitoring JVM (5.0+) regarding memory usage, threads, GC, MBeans etc. Let’s see how to use it over SSH to monitor (or even profile, using its sampler) a remote JVM either with JMX or without it. ThisContinue reading “VisualVM: Monitoring Remote JVM Over SSH (JMX Or Not)”
Zabbix: Fixing Active Checks to Work With Zabbix Proxy
We’ve recently changed our Zabbix 1.8.1 setup to include Zabbix Proxy, which broke all our active checks (f.ex. monitoring of log files). The solution seems to be having the proxy first, before the Zabbix Server, in the Zabbix Agent’s config parameter Server, i.e. “Server=<proxy ip>,<server ip>”.
Notify on Errors in a Log File with Zabbix 1.8
Situation: You want to get notified when a log entry marked ERROR appears in a log file. You want the corresponding trigger to reset back to the OK state if there are no more errors for 10 minutes. (This post assumes certain familiarity with Zabbix UI.)
Testing Zabbix Trigger Expressions
When defining a Zabbix (1.8.2) trigger e.g. to inform you that there are errors in a log file, how do you verify that it is correct? As somebody recommended in a forum, you can use a Calculated Item with a similar expression (the syntax is little different from triggers). Contrary to triggers, the value ofContinue reading “Testing Zabbix Trigger Expressions”