Intro: Java Webapp Monitoring with Hyperic HQ + How to Alert on Too Many Errors in Logs
Posted by Jakub Holý on October 17, 2011
This post describes how to set up the Java-based open source monitoring tool Hyperic HQ to monitor application server error logs and send a single warning e-mail when there are more of them than a threshold. In the previous post Aggregating Error Logs to Send a Warning Email When Too Many of Them – Log4j, Stat4j, SMTPAppender we’ve seen how to achieve that programatically while this solution is just about configuration. We will also see a little what else (a lot!) Hyperic can do for you and what the impressions after a short experimentation with it are.
I’ll be using Tomcat as the “application server” but it certainly works for other common ASs.
Overview of Hyperic HQ
Hyperic HQ 4.6 (and its commercial edition called VMware vFabric Hyperic) in points:
- Developed by SpringSource/VMware
- Most likely the best available open-source monitoring SW (I believe it competes directly with Nagios but claims superiority in certain areas and I’d suppose it to be a better fit for the Java world as it itself is written in Java)
- Open-source, the enterprise version has some useful but non-substantial features like LDAP integration, dashboards personalized for roles, multi-action alerts (that would be useful but it is open-source and you can DIY) – see the OSS x enteprise comparison
- You need to install Hyperic HQ monitoring server on one computer, a Hyperic agent on each machine to monitor and, if required, enable monitoring features in your SW (e.g. JMX in Tomcat)
- For agents and the server to communicate you must open a port on each machine (the enterp. version supports also unidirection communication initiated always by the agent)
- The server is a single package containing an embedded JBoss and PostgreSQL
- It has monitoring (“resource”) plugins for many popular OSs, DBs, ASs, frameworks (Spring, JEE …), HTTP servers etc. and integration with other technologies like JMX and softwares like Nagios and (enterpr. only) OpenNMS
Hyperic HQ is very easy to install and the agent can detect many resources/services on its own and if the target SW needs some configuration to be monitorable then the Hyperic server will inform you about it and (at least in my case) provide instructions what to do.
Download the Hyperic HQ open-source edition. (Notice that Hyperic HQ is used to refer to the open-source version while vFabric Hyperic refers to the enterprise edition.)
To use Hyperic you need to know that a “platform” is a machine/OS, a “server” is a SW running there such as a DB or an AS, and “service” is st. monitorable running on the server such as the “Apache Tomcat 6.0 Thread Pools” service. Each of these levels has some metrics (CPU usage x JVM heap size x number active threads).
Key UI Sections
The start page/tab is Dashboard, where you can add “portlets” such as the Metric Viewer to get a quick overview of your systems. The next tab is Resources, where you can search for the monitored platforms/servers/services to show their detailed metrics, graphs etc., configure monitoring and alerts and so on. The Analyze tab provides an overview of events and alerts and Administration allows you e.g. to add users and to change what metrics are collected and shown by default (i.e. as indicators) for each resource type.
When you display a resource such as the Tomcat server, you’ve 4 additional tabs: Monitor (metrics, charts), Inventory (configuration and sub-resources), Alerts (see below), Control (define control actions on the resource such as restart), Views (Live Exec to execute OS monitoring commands; EE only??).
You may want to browse through the screenshots & little text in four New to Hyperic HQ articles referenced at the end of this post to get a good overview of what the UI looks like and how it is used.
Configuring Hyperic to Send Email When Too Many Errors in Logs
We will see how to install Hyperic, use it to monitor a local (or remote, it would be nearly the same) installation of Tomcat and how to alert us via e-mail when the number of errors in the Tomcat’s logs exceeds a threshold.
Installing Hyperic HQ 4.6 Server and Agent
- Download the hyperic-hq-installer-* for your platform or hyperic-hq-installer-*-noJRE.zip, unpack it and run its setup.sh/.bat. Notice that it can be used to install either only the agent, only the server or both.
- Note: run setup.sh -full to get full set of options such as which DB to use (default = the embedded PostgreSQL, for production use it recommends a standalone Oracle or MySQL and supports also a standalone PostgreSQL)
- By default the Hyperic Server web UI runs on the port 7080 (https on 7443)
- Afterwards you may check <installer>/installer/logs/hq-install.log[.verbose]
- It will ask about the IP and http [and https] port of the monitoring server, the server admin credentials and the IP and port (default: 2144) the server should use to contact the agent
- You may check <agent dir>/log/agent.log to see the results of the autodiscovery of local services
Enabling Monitoring of a Tomcat
Log into your Hyperic HQ, the Dashboard will be displayed. If your agent started successfully then you should see its machine (f.ex. “YourDomainName (MacOSX)”) in the Recently Added portlet or on the Resources tab under Platforms (if not then check the agent’s log).
Click on the platform’s name or go to Resources – Servers. You should see your Tomcat there provided that it was started, st. like “YourDomainName Apache Tomcat 6.0″ (if not then check the agent’s log). Most likely in the Availability column you will see a grey icon signifying that HQ isn’t able to get monitoring data from it. If you click on its name you should be informed about the problem and provided with the instructions to enable JMX monitoring on the Tomcat (either put it all on one line removing the ‘\’ or make sure there is no space etc. following the backslash). Do it & restart Tomcat. The grey icon should turn green.
(I guess Hyperic should be able monitor the logs w/o turning on JMX but haven’t verified that.)
Enable Tomcat Log Monitoring
Go to Resources – Servers and click on your Tomcat, switch to the inventory tab and scroll down to the Configuration Properties and make sure you have there server.log_track.enable true and server.log_track.level Error, if not then click on Edit in the bottom-left corner and change it. (Notice that you can also specify a log pattern match and an alternative location of the log file.) You should have st. like:
You may actually set a lower log level as it is also possible to specify wich severity to track in the alert.
Setting Up an Alert
We will now tell Hyperic to produce a one-time alert when the number of errors in the log in the last 10 minutes exceeds 3. Alerts are shown in the UI and can be also send by email to any registered HQ user or just about any e-mail address.
Go to the Tomcat server resource (as above) and select the Alert sub-tab out of Monitor|Inventory|Alert|Control|View. Click on [Configure] and [New...] and fill it in as shown below:
At the bottom (not visible on the screenshot) you can click on Notify Other Recipients to add an e-mail address where to send the alerts.
Fire the Alert
Make your Tomcat log three exceptions and when coming back to the Hyperic UI or refreshing it you should see the alert in the masthead and also on the dashboard in the Recent Alerts portlet. If you configured the SMTP correctly and sat the alert to be send via e-mail then you should also get it into your mailbox.
You can use the Recent Alerts portlet to mark an alert as fixed so that if the situation re-occures then a new alert will be generated (don’t forget that we told HQ not to generate further alerts until fixed).
Impressions from Hyperic HQ 4.6
My experience with Hyperic HQ is extremely short so I cannot provide a well-founded evaluation, just a bunch of impressions.
It is certainly very powerful regarding what it can monitor, pretty easy to set up, and moderately intuitive. The UI is little old-fashioned but ops/devs folks are used to such things and don’t need everything to be like GMail or GitHub. The configurability of the dashboard is quite disappointing (metric views only in one column, cannot combine more metrics in one view, lot of wasted space etc.), especially compared to what I got used to in IBM’s Rational Jazz. But it is possible to get the data via a webservice so you might be able to build your own display. It would be nice to have multi-conditional alerts and scriptable actions in the open source editions but it is free after all.
I miss the ability to define custom derived metrics (e.g. mean + std. deviation) or aggregations (e.g. weighted average). It actually seems to me that HQ has no concept of metric data aggregation. The only way around that is to get the raw data e.g. via the MetricDataApi and agreggate it/derive metrics yourself.
To try out:
- Switch between Show List View and Show Chart View on the Resources tab
- Define a custom group of related resources to make it easier to show them all at once etc. (Resources – new group and then Tools Menu – Add to group)
- Script Service: run a custom measurement script on a scheduled basis and save metrics in the Hyperic database along with plugin-reported metrics (Google it)
- By default, HQ collects a small subset of the available metrics. You can change that at Administration – Monitoring Defaults – click the resource’s Edit metric templates, check the metrics you want and whether to show them by default (= indicators), enter a collection interval and the button next to it
- Quote: “Even if developing Hyperic HQ plugins has an initial cost, we got familiar with it and developed many JMX Mbeans + associated Hyperic plugins” – available at Google Code/Xebia
Hyperic HQ looks really good (as a tool, not the UI). Somewhere it may be difficult to get a port at the server and at the monitored machine open and you should absolutely use an external DB and be aware of its possible rapid growth if you collect lot of monitoring data and don’t purge the older ones in some way. Its functionality is pretty good regarding both what can be monitored and what you can do with the data in the UI and if that is not enough then you have the webservice HQAPI and the full source code of HQ at your disposal. I’m certainly looking forward to trying it out on my next project.
If you have an experience with Hyperic HQ, please share it with us in the comments. Thanks!
- Recommended: New to Hyperic HQ: Part 1 (dashboard), Part 2 (resources), Part 3 (adding new platform/server), Part 4 (alerts) – lot of screenshots, only little of text, pretty useful for an overview and some common tasks
- Recommended: Demo – Monitoring in Hyperic HQ (6 min video) – the 3 types of metrics, how to interpret them correctly. A good run-through the application.
- Hyperic resources
- HyperForge – the home of Hyperic resource (i.e. monitoring) plugins
- Hyperic HQ/vFabric 4.6 documentation wiki. The way they mark what is in the open-source and what in the EE version is beyond my comprehension (e.g. Alerts are in both while the advanced alert func. only in EE but can you see a difference?). See also 4.5′s Hyperic HQ Overview (or the less conscise version for 4.6).
- Hyperic WebService API (HQAPI) – get alerts, metric data, … . HQAPI at GitHub.
- Articles etc.
- YouTube: Understanding JMX Plugins in Hyperic HQ (35 min) – intro, plugins arch., concepts, building JMX plugins
- Apache Cassandra monitoring through Hyperic HQ (using a custom JMX plugin, HQ 4.4)
- Monitoring webapps with Hyperic & Hyperic web service API (2008) – lot of HQAPI code
- Configuring and Monitoring tc Runtime Instances Using Hyperic HQ
- New Relic – SaaS, has Java agent for Java 5+, supports multiple languages and PaaS providers, a Java and REST API in addition to that, drill down into slow transactions/DB operations. NewRelic Lite with very elementary monitoring is free, 14d trial for Pro. Nice UI, the drill-down is very valuable. Actively developed and extended (e.g. recently server monitoring). ThoughtWorks Technology Radar 7/2011 recommends it (they used it for RoR and .NET).
- AppDynamics (see this article about A.D. in use) – monitoring & support for analysis via drill-down to the problematic areas. Pretty good if you have an application stack that it supports. SaaS or local deployment.