(Work in progress.)
Tools for monitoring systems and applications, especially in the Java EE environment.
- Metric collection – collection of metrics from different machines, servers, services, perhaps diff. monitoring systems
- Metric aggregation – avg/min/max/std.dev etc. over a configurable period of time
- ability to compare with corresponding previous period(s) (prev. week, year, …), derivation of a “baseline” to make it easy to spot deviations from the normal behavior
- Alerting and notifications – alert when a metric exceeds a limit, … (preferably reasonable default + scriptable advanced conditions; scriptable alert actions, notifications via e-mail and other channels, …)
- Eventing – ability to correlate metrics with system events such as a DB upgrade
- UI, dashboard (configurable display of metrics, graphs, combined graphs, alerts)
- API (preferably REST, JSON)
- Number of supported platforms, servers (DB, AS, cache, …)
- Support for JavaEE-specific software (Tomcat threadpools, JDBC etc.)
- Support for JMX to easily monitor anything Java-based
- Extensability: Define custom metrics/aggregations, fetch the collected data, …
- Hyperic HQ, open-source & enterprise ed., see my intro
- Zenoss, open-source & enterprise ed.
- Nagios, open-source and ?, seems to be pretty popular
- Ganglia (quick start), open-source, primarily targets Linux. Uses XML, XRD, RRD, web frontend in PHP, described as “a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids”. Extensible via C/Python.
- Zabbix, open-source; general, version 2 should have direct JMX support; compared to Hyperic: reasonable UI with custom dashboards, custom actions; quite flexible
- Netflix Servo – for collecting and publishing metrics from a java app (to a file, CloudWatch, in-memory sample). Exposes metrics (annotated fields) automatically over JMX. See this example. Essentially you either create different monitor objects or annotate fields, make sure they are updated as needed, register their owner with a registry, and run a poller that takes samples and publishes them. Computes various stats, supports publishing to graphite. Supports various metrics: informational (a string?), counter, gauge, … . A nice thing is the AsyncMetricObserver (update calls return immediatelly).
- JavaMelody – “monitor Java or Java EE application servers in QA and production environments,” targetted primarily at web/app servers, built-in charting. A proxy for monitoring of JDBC Drivers, EJBM interceptor, Spring support (=> make a method monitored with an annotation). servlet filter. Data stored in .rdd files locally or sent to a central server. There is a short & nice presentation at SlideShare.
- Java Simon – Simple Monitoring API (JAMon replacement) – record start and stop named events using the Simon API, Simon will compute metrics such as min/max/avg/std.dev/count/… . The events may be hierarchical (e.g. web call -> db call). Support for capturing snapshots (samples) of the metrics, monitoring Java EE (DB Driver wrapper, monitoring servlet Filter, interceptors). Web console to monitor the metrics. Exposed over JMX.
- Yammer’s Metrics for instrumenting JVM-based services. Can report to e.g. Ganglia and Graphite. Support e.g. for Guice, Jetty, Jersey, Log4j, Apache HttpClient, Ehcache, Logback, Spring. Manual usage: create a metric (gauge, counter, meter (rate), histogram (min/max/std.dev./percentiles), timer (-> rate, duration distribution)) and keep on updating it manually. Each metric is bound to a class (-> package hierarchy). Access the metrics via JMX or via HTTP as JSON or send them to a stdout/csv file/ganglie/graphite. Support for custom HealtCheck:s. The AdminServlet serves the JSON metrics, may run health checks, print a thread dump, provide a simple ping response to load balancers (all of these are also available separately). Scala support, metrics servlet Filter, … . Annotations for measuring methods of Guice/Jersey/… managed ojbects. JavaDoc.
- Graphite – “an enterprise-scale monitoring tool that runs well on cheap hardware”. A popular tool for visualizing time-series data (it receives data, stores them in a RDD-type db, and shows them in a web UI). Configurable graphs, dashboards etc. Written n Python.
- JRDS – “Jrds is performance collector, much like cacti or munins. But it intends to be more easy to use and able to collect a high number of machines in a very short time. It’s fully written in java and avoid call external process to increase performances. It uses RRD4J, a clone of rrdtool written in java.”
- Rocksteady + Esper: Rocksteady is a java app that reads metrics from RabbitMQ, parse them and turn them into events so Esper (CEP) can query against those metric and react to events match by the query.
- Jolokia is remote JMX with JSON over HTTP: a REST API bridged to JMX, with support for security, fine-grained access control, bulk operations. Especially useful if you either 1) need to perform bulk operations (e.g. get multiple values) or 2) want to access them from something that doesn’t support JMX. JSON is in general very easy to use and navigate. You can install Jolokia as a WAR (or mebedd its Servlet), a JVM agent, or attach it on-the-fly to a running JVM.
- LogStash – a popular tool that can collect (directly/via [distributed] syslog), parse (=> extract timestamp etc), and store logs – support for indexing storing in ElasticSearch for searching, parses e.g. Apache logs out of the box. See this Logstash slides (9/2012).
- Can also send metrics to statsD / Graphite
- Kibana – a web interface to seach logs from LogStash, view them in realtime (based on a query) etc. See the overview of Kibana’s powers.