The Holy Java

Building the right thing, building it right, fast

Ops: Monitoring

(Work in progress.)

Tools for monitoring systems and applications, especially in the Java EE environment.

Monitoring needs

  1. Metric collection – collection of metrics from different machines, servers, services, perhaps diff. monitoring systems
  2. Metric aggregation – avg/min/max/std.dev etc. over a configurable period of time
    • ability to compare with corresponding previous period(s) (prev. week, year, …), derivation of a “baseline” to make it easy to spot deviations from the normal behavior
  3. Alerting and notifications – alert when a metric exceeds a limit, … (preferably reasonable default + scriptable advanced conditions; scriptable alert actions, notifications via e-mail and other channels, …)
  4. Eventing – ability to correlate metrics with system events such as a DB upgrade
  5. Other
    • UI, dashboard (configurable display of metrics, graphs, combined graphs, alerts)
    • API (preferably REST, JSON)
    • Extensability

Some criteria

  • Number of supported platforms, servers (DB, AS, cache, …)
  • Support for JavaEE-specific software (Tomcat threadpools, JDBC etc.)
  • Support for JMX to easily monitor anything Java-based
  • Extensability: Define custom metrics/aggregations, fetch the collected data, …

Tools

  • Hyperic HQ, open-source & enterprise ed., see my intro
  • Zenoss, open-source & enterprise ed.
  • Nagios, open-source and ?, seems to be pretty popular
  • Ganglia (quick start), open-source, primarily targets Linux. Uses XML, XRD, RRD, web frontend in PHP, described as “a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids”. Extensible via C/Python.
  • Zabbix, open-source; general, version 2 should have direct JMX support; compared to Hyperic: reasonable UI with custom dashboards, custom actions; quite flexible

Libraries

Metrics

  • Netflix Servo – for collecting and publishing metrics from a java app (to a file, CloudWatch, in-memory sample). Exposes metrics (annotated fields) automatically over JMX. See this example. Essentially you either create different monitor objects or annotate fields, make sure they are updated as needed, register their owner with a registry, and run a poller that takes samples and publishes them. Computes various stats, supports publishing to graphite. Supports various metrics: informational (a string?), counter, gauge, … . A nice thing is the AsyncMetricObserver (update calls return immediatelly).
  • JavaMelody – “monitor Java or Java EE application servers in QA and production environments,” targetted primarily at web/app servers, built-in charting. A proxy for monitoring of JDBC Drivers, EJBM interceptor, Spring support (=> make a method monitored with an annotation). servlet filter. Data stored in .rdd files locally or sent to a central server. There is a short & nice presentation at SlideShare.
  • Java Simon – Simple Monitoring API  (JAMon replacement) – record start and stop named events using the Simon API, Simon will compute metrics such as min/max/avg/std.dev/count/… . The events may be hierarchical (e.g. web call -> db call). Support for capturing snapshots (samples) of the metrics, monitoring Java EE (DB Driver wrapper, monitoring servlet Filter, interceptors). Web console to monitor the metrics. Exposed over JMX.
  • Yammer’s Metrics for instrumenting JVM-based services. Can report to e.g. Ganglia and Graphite. Support e.g. for Guice, Jetty, Jersey, Log4j, Apache HttpClient, Ehcache, Logback, Spring. Manual usage: create a metric (gauge, counter, meter (rate), histogram (min/max/std.dev./percentiles), timer (-> rate, duration distribution)) and keep on updating it manually. Each metric is bound to a class (-> package hierarchy). Access the metrics via JMX or via HTTP as JSON or send them to a stdout/csv file/ganglie/graphite. Support for custom HealtCheck:s. The AdminServlet serves the JSON metrics, may run health checks, print a thread dump, provide a simple ping response to load balancers (all of these are also available separately). Scala support, metrics servlet Filter, … . Annotations for measuring methods of Guice/Jersey/… managed ojbects. JavaDoc.

Related

  • Graphite – “an enterprise-scale monitoring tool that runs well on cheap hardware”. A popular tool for visualizing time-series data (it receives data, stores them in a RDD-type db, and shows them in a web UI). Configurable graphs, dashboards etc. Written n Python.
  • JRDS – “Jrds is performance collector, much like cacti or munins. But it intends to be more easy to use and able to collect a high number of machines in a very short time. It’s fully written in java and avoid call external process to increase performances. It uses RRD4J, a clone of rrdtool written in java.”
  • Rocksteady + Esper: Rocksteady is a java app that reads metrics from RabbitMQ, parse them and turn them into events so Esper (CEP) can query against those metric and react to events match by the query.
  • Jolokia is remote JMX with JSON over HTTP: a REST API bridged to JMX, with support for security, fine-grained access control, bulk operations. Especially useful if you either 1)  need to perform bulk operations (e.g. get multiple values) or 2) want to access them from something that doesn’t support JMX. JSON is in general very easy to use and navigate. You can install Jolokia as a WAR (or mebedd its Servlet), a JVM agent, or attach it on-the-fly to a running JVM.

Logs

  • LogStash – a popular tool that can collect (directly/via [distributed] syslog), parse (=> extract timestamp etc), and store logs – support for indexing storing in ElasticSearch for searching, parses e.g. Apache logs out of the box. See this Logstash slides (9/2012).
  • Kibana – a web interface to seach logs from LogStash, view them in realtime (based on a query) etc. See the overview of Kibana’s powers.

Other

  • hawtio web console – a lightweight and modular HTML5 web console with lots of plugins for managing your Java stuff – to be embedded into standalone apps or containers such as Jetty/Tomcat – view metrics, manage, … (using JMX under the cover)

2 Responses to “Ops: Monitoring”

  1. […] from The Holy Java gave overview of tools available; and  slideshare resource explains about use of the […]

  2. […] from The Holy Java gave overview of tools available; and  slideshare resource explains about use of the […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: