Hadoop’s NameNode and JobTracker expose interesting metrics and statistics over the JMX. Hive seems not to expose anything intersting but it still might be useful to monitor its JVM or do simpler profiling/sampling on it. Let’s see how to enable JMX and how to access it securely, over SSH.
Posts Tagged ‘monitoring’
Enabling JMX Monitoring for Hadoop And Hive
Posted by Jakub Holý on September 21, 2012
Posted in Tools | Tagged: bigdata, hadoop, hive, monitoring, ops | 2 Comments »
VisualVM: Monitoring Remote JVM Over SSH (JMX Or Not)
Posted by Jakub Holý on September 21, 2012
(Disclaimer: Based on personal experience and little research, the information might be incomplete.)
VisualVM is a great tool for monitoring JVM (5.0+) regarding memory usage, threads, GC, MBeans etc. Let’s see how to use it over SSH to monitor (or even profile, using its sampler) a remote JVM either with JMX or without it.
This post is based on Sun JVM 1.6 running on Ubuntu 10 and VisualVM 1.3.3.
Posted in General, Java, Tools | Tagged: jmx, monitoring, ops | 1 Comment »
Zabbix: Fixing Active Checks to Work With Zabbix Proxy
Posted by Jakub Holý on August 8, 2012
We’ve recently changed our Zabbix 1.8.1 setup to include Zabbix Proxy, which broke all our active checks (f.ex. monitoring of log files). The solution seems to be having the proxy first, before the Zabbix Server, in the Zabbix Agent’s config parameter Server, i.e. “Server=<proxy ip>,<server ip>”.
Posted in General, Tools | Tagged: monitoring, ops, zabbix | Leave a Comment »
Notify on Errors in a Log File with Zabbix 1.8
Posted by Jakub Holý on July 3, 2012
Situation: You want to get notified when a log entry marked ERROR appears in a log file. You want the corresponding trigger to reset back to the OK state if there are no more errors for 10 minutes. (This post assumes certain familiarity with Zabbix UI.)
Posted in Tools | Tagged: monitoring, ops, zabbix | Leave a Comment »
Testing Zabbix Trigger Expressions
Posted by Jakub Holý on July 2, 2012
When defining a Zabbix (1.8.2) trigger e.g. to inform you that there are errors in a log file, how do you verify that it is correct? As somebody recommended in a forum, you can use a Calculated Item with a similar expression (the syntax is little different from triggers). Contrary to triggers, the value of a calculated item is easy to see and the historical values are stored so you can check how it evolved. If your trigger expression is complex the you can create multiple calculated items, one for each subexpression.
Posted in Testing, Tools | Tagged: monitoring, ops, zabbix | Leave a Comment »
Most interesting links of October
Posted by Jakub Holý on October 31, 2011
Recommended Readings
- Steve Yegge’s Execution in the Kingdom of Nouns – I guess you’ve already read this one but if not – it is a well-written and amusing post about why not having functions as first class citizens in Java causes developers to suffer. Highly recommended.
- Reply to Comparing Java Web Frameworks – a very nice and objective response to a recent blog summarizing a JavaOne presentation about the “top 4″ web frameworks. The author argues that based on number of resources such as job trends, StackOverflow questions etc. (however data from each of them on its own is biased in a way) JSF is a very popular framework – and rightly so for even though JSF 1 sucked, JSF 2 is really good (and still improving). Interesting links too (such as What’s new in JSF 2.2?). Corresponds to my belief that GWT and JSF are some of the best frameworks available.
- Using @Nullable – use javax.annotation.Nullable with Guava’s checkNotNull to fail fast when an unexpected null appeares in method arguments
- JavaOne 2011: Migrating Spring Applications to Java EE 6 (slides) – nice (and visually attractive) comparison of JavaEE and Spring and proposal of a migration path. It’s fun and worthy to see.
- xUnitPatterns – one of the elementary sources that anybody interested in testing should read through. Not only it explains all the basic concepts (mocks, stubs, fakes,…) but also many pitfalls to avoid (various test smells such as fragile tests due to Data Sensitivity, Behavior Sensitivity, Overspecified Software [due to mocks] etc.), various strategies (such as for fixture setup), and general testing principles. The materials on the site were turned into the book xUnit Test Patterns: Refactoring Test Code (2007), which is more up-to-date and thus a better source.
- Eclipse tip: Automatically insert at correct position: Semicolon, Braces – in “while(|)” type “true {” to get ”while(true) {|” i.e. the ‘{‘ is moved to the end where it belongs, the same works for ‘;’
- Google Test Analytics – Now in Open Source – introduces Google’s Attributes-Components-Capabilities (ACC) application intended to replace laborous and write&forget test plans with something much more usable and quicker to set up, it’s both a methodology for determining what needs to be tested and a tool for doing so and tracking the progress and high-risk areas (based not just on estimates but also actual data such as test coverage and bug count). The article is a good and brief introduction, you may also want to check a live hosted version and a little more detailed explanation on the project’s wiki.
- JSF and Facelets: build-time vs. render-time (component) tags (2007) – avoid mixing them incorrectly
- StackOverflow: What are the main disadvantages of Java Server Faces 2.0? Answer: The negative image of JSF comes from 1.x, JSF 2 is very good (and 2.2 is expected to be just perfect
). Nice summary and JSF history review. - Ola Bini: JavaScript in the small – best practices for projects using partly JavaScript – the module pattern (code in the body of an immediately executed function not to polute the global var namespace), handling module dependencies with st. like RequireJS, keeping JS out of HTML, functions generating functions for more readable code, use of many anonymous functions e.g. as a kind of named parameters, testing, open questions.
Talks
- Kent Beck’s JavaZone talk Software G Forces: The Effects of Acceleration is absolutely worth the 1h time. Kent describes how the development process, practices and partly the whole organization have to change as you go from annual to monthly to weekly, daily, hourly deployments. What is a best practice for one of these speeds becomes an impediment for another one – so know where you are. You can get an older version of the slides and there is also a detailed summary of the talk from another event.
- Rich Hickey: Simple Made Easy - Rich, the author of Clojure, argues very well that we should primarily care for our tools, constructs and artifacts to be “simple”, i.e. with minimal complexity, rather than “easy” i.e. not far from our current understanding and skill set. Simple means minimal interleaving – one concept, one task, one role, minimal mixing of who, what, how, when, where, why. While easy tools may make us start faster, only simplicity will make it possible to keep going fast because (growing) comlexity is the main cause of slowness. And simplicity is a choice – we can create the same programs we do today with the tools of complexity with drastically simpler tools. Rich of course explains what, according to him, are complex tools and their simple(r) alternatives – see below. The start of the 1h talk is little slow but it is worth the time. I agree with him that we should much more thing about the simplicity/complexity of the things we use and create rather than easiness (think ORM).
Read also Uncle Bob’s affirmative reaction (“All too often we do what’s easy, at the expense of what’s simple. And so we make a mess. [...] doing what is simple as opposed to what is easy is one of the defining characteristics of a software craftsman.”).
Random Notes from Rich’s Simple Made Easy Talk:
There are also better notes by Alex Baranosky and you may want to check a follow-up discussion with some Rich’s answers.
The complex vs. simple toolkit (around 0:31):
COMPLEXITY SIMPLICITY
State, objects Values
Methods Functions, namespaces
vars Managed refs
Inheritance, switch, matching Polymorphism a la carte
Syntax Data
Imperative loops, fold Set functions
Actors Queues
ORM Declarative data manipulation
Conditionals Rules
Inconsistency Consistency
What each of the complexity constructs mixes (complects) together
CONSTRUCT COMPLECTS (MIXES)
State, objects – everything that touches it (for state complects time and value)
Methods – function and state, namespaces (2 classes, same m. name)
Syntax – Meaning, order
Inheritance – Types (ancestors, child)
Switch/matching – Multiple who/what pairs (1.decide who, 2.do what ?)
var(iable)s – Value, time
Imperative loops, fold – what/how (fold – order)
Actors – what/who
ORM – OMG ![]()
Conditionals – Why, rest of program (rules what program does are intertw. with the structure and order of the program, distributed all over it)
HE SIMPLICITY TOOLKIT (around 0:44)
CONSTRUCT GET IT IVA…
Values – Final, persistent collections
Functions – a.k.a. stateless methods
Namespaces – Language support
Data – Maps, arrays, sets, XML, JSON etc.
Polymorphism a la carte – Protocols, Haskell type classes
Managed refs – Clojure/Haskell refs (compose time and value , not mix)
Set functions – Libraries
Queues – Libraries
Declarative data manipulation – SQL/LINQ/Datalog
Rules – Libraries, Prolog
Consistency – Transactions, values
True abstraction isn’t hiding complexity but drawing things away – along one of the dimensions of who, what, when, where, why [policy&rules of the app.], how.
Abstraction => there are things I don’t need – and don’t want – to know.
Why – do explore rules and declarative logic systems.
When, where – when obj. A communicates with obj. B. => put a queue in between them so that A doesn’t need to know where B is; you should use Qs extensively.
Links to Keep
- Incredibly Useful CSS Snippets - “a list of CSS snippets that will help you minimize headaches, frustration and save your time while writing css” – few float resets, targetting specific browsers & browser hacks, cross-rowser transparency/min height/drop shadow, Google Font API, link styled by file type,
DevOps: Tools and libraries for system monitoring and (time series) data plotting
- Hyperic SIGAR API – open-source library that unifies collection of system-related metrics such as memory, CPU load, processes, file system metrics across most common operating systems
- rrd4j – Java clone of the famous RRDTool, which stores, aggregates and plots time-series data (RRD = round-robin database, i.e. keeps only a given number of samples and thus has a fixed size)
- JRDS “is performance collector, much like cacti or munins”, uses rrd4j. The documentation could be better and it seems to be just a one man project but it might be interesting to look at it.
Clojure Corner
- Alex Miller: Real world Clojure – a summary of experiences with using Clojure in enterprise data integration and analytics products at Revelytix, since early 2011 with a team of 5-10 devs. Some observations: Clojure code is 1-2 order of magnitude smaller than Java. It might take more time to learn than Java but not much. Clojure tooling is acceptable, Emacs is still the best. Debugging tools are unsurprisingly quite inferior to those for Java. Java profiling tools work but it may be hard to interpret the results. “[..] I’ve come to appreciate the data-centric approach to building software.” Performance has been generally good so far.
- Article series Real World Clojure at World Singles – the series focuses on various aspects of using Clojure and how it was used to solve particular problems at a large dating site that starting to migrate to it in 2010. Very interesting. F. ex. XML generation, multi-environment configuration, tooling (“If Eclipse is your drug of choice, CCW [Counter ClockWise] will be a good way to work with Clojure.”, “Clojure tooling is still pretty young [..] - but given how much simpler Clojure is than most languages, you may not miss various features as much as you might expect!”)
- StackOverflow: Comparing Clojure books – Programming Clojure, Clojure in Action, The Joy of Clojure, Practical Clojure – which one to pick? A pretty good comparison.
- Clojure is a Get Stuff Done Language – experience report – “For all that people think of Clojure as a “hard” “propeller-head” language, it’s actually designed right from the start not for intellectual purity, but developer productivity.”
Posted in eclipse, General, j2ee, Java, Testing, Top links of month | Tagged: clojure, css, facelets, javaEE, JavaScript, jsf, lean, monitoring, ops, spring | 3 Comments »
Intro: Java Webapp Monitoring with Hyperic HQ + How to Alert on Too Many Errors in Logs
Posted by Jakub Holý on October 17, 2011
This post describes how to set up the Java-based open source monitoring tool Hyperic HQ to monitor application server error logs and send a single warning e-mail when there are more of them than a threshold. In the previous post Aggregating Error Logs to Send a Warning Email When Too Many of Them – Log4j, Stat4j, SMTPAppender we’ve seen how to achieve that programatically while this solution is just about configuration. We will also see a little what else (a lot!) Hyperic can do for you and what the impressions after a short experimentation with it are. Read the rest of this entry »
Posted in j2ee, Tools | Tagged: hyperic, logging, monitoring, ops | Leave a Comment »
Aggregating Error Logs to Send a Warning Email When Too Many of Them – Log4j, Stat4j, SMTPAppender
Posted by Jakub Holý on October 15, 2011
Our development team wanted to get notified as soon as something goes wrong in our production system, a critical Java web application serving thousands of customers daily. The idea was to let it send us an email when there are too many errors, indicating usually a problem with a database, an external web service, or something really bad with the application itself. In this post I want to present a simple solution we have implemented using a custom Log4J Appender based on Stats4j and an SMTPAppender (which is more difficult to configure and troubleshoot than you might expect) and in the following post I explore how to achieve the same effect with the open-source Hyperic HQ monitoring SW.
Posted in j2ee, Tools | Tagged: log4j, logging, monitoring, ops | 1 Comment »
Most interesting links of May
Posted by Jakub Holý on June 2, 2010
The most interesting stuff I’ve read in May, in no particular order. You can easily guess I’ve been working on performance troubleshooting this month
- NoSQL is About… – all the things NoSql databases are said to be about (and perhaps are not) and a good overview of the different goals and thus also features of the various implementations
- Bulletproof of Mind Mapping: Overview, Benefits, Tips and Tools – the article not only introduces mind maps (a structured way of recording ideas, much less limited than lists) but also describes over 30 desktop and web-based MM tools, both free and commercial (some of the descriptions come from the SW’s web, some from the author – the distinction isn’t clear)
- Java vs. C Performance….Again. (9/2009) – When C(++) is better than Java, when Java is more appropriate, and common flaws in comparions methodologies/false arguments.
- Why Learning Git is really, really hard part 1 and part 2 with actual reasons – because it doesn’t care enough for usability (unusual commands, cryptic error messages, impossibly to go to a “simpler use mode”). I’m intrigued by distributed SCM systems and tired of not-so-easy branching & merging in SVN and its lovely problems with corrupted metadata (when you delete a folder…) and thus I was considering switching to Git that everybody is so excited about. I still plan that but these articles warned me that it may be not so painless and easy. A good read.
- Java VisualVM Blogging Contest results – the best posts -
- VisualVM – tool for profiling Java applications – nice, short intro with many pictures
- Analyzing Memory Leak in Java Applications using VisualVM
- (and others … )
- How to compute running mean/standard deviation - this page explains and in C implements an algorithm for computing a running estimate of mean and standard deviation, which minimizes accumulation of precision errors. A running estimation has the advantage that you do not need to store all the numbers and is thus suitable e.g. for continuous performance monitoring with a low memory overhead (buth the performance overhead of a division and multiplication it introduces is perhaps also something to consider – though for most application it’s negligible)
- (Java) Web performance in seven steps – a great article about the “management of performance” of a Web/JEE application from the definition of performance requirements up to continual performance monitoring with interesting war stories and links to various useful tools. I can sign the author’s maxim “measure, don’t guess!”. The Java monitoring API Java Simon mentioned in the article is worth a look.
Posted in General, Java, Top links of month | Tagged: Git, java, monitoring, nosql, performance | Leave a Comment »
Webapp performance monitoring with Glassbox 2.0: How does it work?
Posted by Jakub Holý on October 31, 2008
A word of warning: Information on this page originates from my exploration of Glassbox performed in Oct 2008 and may be inaccurate. Ron Bodkin, the mastermind behind Glassbox, was so kind as to review this but still there may be some mistakes or inexact informatiom left. In any case blame me
Introduction
There’re a few open source java webapp monitoring tools: Glassbox (latest release 2008-09-02), JAMon (2007-09-20), InfraRED (2006-05-17), UseMon (2008-10-06). Among these, Glassbox is both still actively developed and mature. It has also nice though only basic user documentation. Unfortunately there is only little docs about its architecture and about customizing it for monitoring a specific webapp. Therefore I’ve decided to dive into its code to learn more about it to be able to decide whether it’s suitable for our needs of monitoring the Pentaho BI Platform web application.
The unique features of Glassbox are that it tries to point out existing performance problems together with their likely causes and advices what to do about them and that it displays aggregated statistics about the high level "operations" in the monitored web app.
Basic Info about Glassbox
Glassbox consists of a webapp UI that displays statistics and of a set of monitors that are injected into the observed webapp and collect data about its performance. Some of its characteristics are:
- Simple installation (1.Deploy webapp; 2. Run postinstall script to copy jars etc.; 3. Restart AS with a command-line option to load glassbox aspects).
- AOP using AspectJ.
- Preferably Java 5+ though it’s also possible to use AspectJ with Java 1.4 and maybe 1.3.
- Data not persistent (though this is on the roadmap).
- Layer aware (UI, logic – EJB, resources – DB, I/O, …).
- Customized monitors for popular frameworks (Spring MVC, …).
- UI concentrates on providing the right information at the right time with the necessary context. In other words it doesn’t display tons of detailed data but a list of top-level operations highlighting the problematic (slow/failing) ones and providing detailes on likely cause(s) of the problems.
If you are new to Glassbox I’d recommend you to have first a look at the presentation Glassbox Architecture & Design (Oct 25, 2008). You may also want to watch an older but still usefull Glassbox Tech Talk video (Sep 25, 2006). And if you’re the visual type there’s a screenshot of Glassbox UI.
How it works
Glossary
- Aspect (wiki): There’re many definitions, for us this is a piece of code injected into the monitored application to perform the actual monitoring of a particular method in a particular class.
- Components and resources: Glassbox breaks time spent in an operation down by the type of activity that consumed it distinguishing between ‘components’ such as database access or remote calls and ‘resources’ such as running java code, running native code, or thread waiting (usually I/O).
- Layer: A monitored method can belong to one of application’s layers of processing such as ui.controller or resource.database.statement.
- Monitored method: Glassbox collect performance statistics only for selected important methods, for instance JDBC calls.
- Operation: An "entry point" of user requests into the application. Usually this is some high-level method such as a Struts Action’s execute or JSP’s service. Glassbox uses
operations as the unit of monitoring service level agreements (SLA) and to analyze where time goes. Anything
that’s used within an operation is considered a resource (EJB call, DB query…). (Though the notion of components shall be added in the future.) - In the source code, the classes having Operation in their name are often also used to track monitored methods.
- Request: a) Vaguely synonymous to a monitored method; b) In this text it often means a user-invoked request that resulted in some activity in the monitored web application (though I try to use the term user request to distinguish it from a) ).
Basic concepts
Glassbox doesn’t bother to monitor all methods that are invoked as a result of user request because it is not necessary and would incur higher overhead. It only monitors some especially significant methods of the call tree such as a Struts Action invocation or invocation of a data access method (being it simple JDBC or Hibernate). I call them "monitored methods", while in Glassbox’ terms they are either operations (the top-level ones) or resource/component request.
The framework is aware of and takes care of tracking the hierarchy of monitored methods invoked during a single user request processing. This basically means that a monitored method may have a parent monitored method and that Glassbox stores both the aggregated duration of the higher level method and that of the nested one. For example, a JDBC executeQuery may be invoked from an EJB call, which is itself invoked from a Struts Action processing. Thus it can tell which processing layer or component/resource caused a particular top-level operation to be slow.
Glassbox is also aware of layers and monitored methods may be marked as belonging to a particular layer, e.g. the JDBC executeQuery would belong to the layer resource.database.statement.
Main components’ behavior explained
The main parts of Glassbox are:
- Monitors:
Pieces of code collecting detailed performance measurements of monitored methods. Usually they’re implemented as aspects extending the glassbox framework,
which are injected (woven) into the monitored webapp instance. But sometimes Glassbox uses also e.g. filters, listeners, callbacks, and timer tasks. - Agent(s):
Handle communication between the monitors and a client. In a
clustered environment, a separate instance is on each server. - Client: Collects monitoring data from agents and displays
an aggregated view to a user. The most used one is the Glassbox web application but you can also use e.g. jconsole or implement another client. Usually you have only one client even in a clustered environment.
Monitors
Monitors are mostly aspects injected into a monitored web application or the related code. Usually a monitor observes one or few methods in a single class. When one of the method is invoked, it creates a Response object storing the details of the invocation (its id, parameters, …) and, once it finishes, its performance data. If the method has been invoked as a part of processing of another monitored method, its Response will have the Response object of that method as its parent, in other words monitored methods can be (even indirectly) nested and this structure is preserved in the monitoring data structures. A typical example is a servlet calling an EJB doing some JDBC stuff.
Responses are stored in a ThreadLocal stack, which makes it possible to really connect those Responses belonging to the processing of a single request, and once the top-level operation completes it can be considered as finished.
Agent
There is one agent per server (i.e. JVM). It collects the detailed monitoring data from monitors and aggregates them by monitored method into "statistics" (Stats) objects. Stats contain summarized data for a monitored method over all of its execution (avg, min, max time…) together with detailed data for a limited number of those executions that were either slow or failed. There are statistics both for top-level monitored methods and for nested ones and they’re aware of each other. These stats are stored in a global StatisticsRegistry (namely OperationTracker’s registry).
There is also a special background job (ThreadMonitor) that monitors threads to detect CPU-intensive methods. This is done by taking snapshots of the thread stack in regular intervals and – if its execution time exceeds a predefined limit – by drawing a conclusion from these snapshots. (Note: This is an example of a timer task monitor.)
Web UI
The Glassbox web user interface queries all registered agents for their data (see OperationsHelper.updateListOfOperations). The detailed and hierarchical statistics are turned into an OperationSummary for the top-level operation having also a list of "findings", i.e. textual descriptions of problems with the operation. It contains average statistics and stats when slow and when failing.
Main components’ behavior in detail
Monitors
To monitor a web application, Glassbox inserts
monitoring aspects into its code and related server code (e.g. its
javax.servlet.Servlet implementation) during server startup using
AspectJ. In its simplest form an aspect is nothing more than an XML
file mapping an existing aspect class to a method(s) of a particular
class.
The sequence of actions triggered by a user request towards a monitored webapp is:
- A monitored method in the target web application is going to be invoked.
- An
aspect associated with that method is invoked. It’s connected to the
monitoring framework by extending a base monitor aspect class and
optionally by calling some of the framework’s methods. - The
framework stores data about the monitored method/operation’s name, layer and
component/resource and its start time. It also checks the thread local
stack of responses to find out whether it’s invoked as a
part of processing some higher level monitored method and if it is the case
then it sets it as its parent. - The monitored method is
invoked and when it finishes, the framework stores its end time. If it
finished with an exception then it’s first checked whether the
exception should be indeed regarded as a failure and if yes, the
exception data is stored as well. If the operation was slow or failed, the framework puts it on the list of slowest/recently failed operations and captures its parameters. - When the original user
request is processed completely, the framework aggregates data about
its performance. This happens in StatsSummarizer, a ResponseListener. If the request processing was too slow with respect to predefined limits
(SLA` usually <1s in 90% of time) then it’s stored into the slow operations list and the
layer/component that caused the delay is marked.
Note: Monitors interact with a ResponseFactory to create and finish Responses and these actions trigger ResponseListener events. The most important ResponseListener is the StatsSummarizer but there’re also others, for instance a listener that can log slow responses and a new one that captures a trace for a specific request. Since response listeners are invoked directly by the response factory, they execute in the same thread as the monitored method itself and therefore can safely use ThreadLocal variables to keep info of requests/responses belonging to a single user interaction.
Agent
As said above, performance statistics are aggregated by monitored methods and are collected into a global statistics registry. The registry contains instances of PerfStats or CompositePerfStats, which also implements StatisticsRegistry to hold nested statistics, or a subclass such as OperationPerfStats. A registry is actualy a map of OperationDescriptions to their OperationPerfStats and can return a subset for a particular StatisticsType, for example UI, Database, Remote Call.
Let’s see a partial example produced by OperationTracker.registry.dump(new StringBuffer(), 0):
operation(type javax.servlet.Servlet; name org.pentaho.ui.servlet.AdhocWebService):glassbox.track.api.OperationPerfStatsImpl@12c3d18 operation(type javax.servlet.Servlet; name org.pentaho.ui.servlet.AdhocWebService)(# = 0, tm =0,00 ms, #slow = 0, # fail = 0)
StatisticsTypeImpl 0 of class glassbox.track.api.UIStatisticsType:
StatisticsTypeImpl 1 of class glassbox.track.api.DatabaseStatisticsType:
StatisticsTypeImpl 2 of class glassbox.track.api.DatabaseConnectionStatisticsType:
StatisticsTypeImpl 3 of class glassbox.track.api.DatabaseStatementStatisticsType:
StatisticsTypeImpl 4 of class glassbox.track.api.SimpleStatisticsType:
StatisticsTypeImpl 5 of class glassbox.track.api.RemoteCallStatisticsType:
StatisticsTypeImpl 6 of class glassbox.track.api.TreeStatisticsTypeImpl:
time:Stats (tm=0,00 ms, slow=0)
StatisticsTypeImpl 0 of class glassbox.track.api.UIStatisticsType:
StatisticsTypeImpl 1 of class glassbox.track.api.DatabaseStatisticsType:
StatisticsTypeImpl 2 of class glassbox.track.api.DatabaseConnectionStatisticsType:
StatisticsTypeImpl 3 of class glassbox.track.api.DatabaseStatementStatisticsType:
StatisticsTypeImpl 4 of class glassbox.track.api.SimpleStatisticsType:
StatisticsTypeImpl 5 of class glassbox.track.api.RemoteCallStatisticsType:
StatisticsTypeImpl 6 of class glassbox.track.api.TreeStatisticsTypeImpl:
An OperationPerfStats holds e.g. resourceTotalStats and otherComponentStats.
*Stats also hold all other necessary information, for instance about slow/failing cases.
OperationTracker also uses an instance of OperationAnalyzer, which is responsible for preparing data for all the nice output you can see in the UI. This includes summarizing the stats into OperationSummaries and detecting (based on the collected stats) what is the cause of a slow/failing operation and providing this info in the form of OperationAnalysis.
Web UI
The web UI, aside of handling installation of Glassbox monitoring into a server, maintains connections to all the agents (or to the single local agent in a non-clustered environment) and retrieves all the needed summaries and analysis from them via its OperationHelper.
Currently the web UI only provides statistics about a top-level operation and its list of detected problems (slow SQL, excessive CPU in method XY, …) with troubleshooting details but you cannot use it to view statistics for its nested monitored methods/components. However you can access those detailed statistics e.g. via the JMX interface.
Additional notes
- You can view results and manage Glassbox using JMX: jconsole
service:jmx:rmi:///jndi/rmi://localhost:7232/GlassboxTroubleshooter . Check $jboss/lib/glassbox/glassbox.properties and glassbox.war/WEB-INF/lib/agent.jar/beans.xml for settings. - You can disable/enable some monitors at runtime via JMX, for example RemoteCallMonitor or JdbcMonitor. I’m not sure whether there is a way to dis/enable on a more granular level.
Glassbox API – Main classes
Here we will learn about the most important Glassbox classes, what they can do for you, and how they relate to each other.
Classes without an extension are regular java classes while those with .aj are AspectJ classes and need to be compiled by the aspectj compiler.
Response API
This API is used by the monitoring aspects to produce the monitoring data that is than further analyzed and presented to the user by Glassbox.
- glassbox.response.Response
- Collects data about the system’s response while processing a request. These are typically nested, i.e., we track times, parameters, etc. for Servlet requests that result in Struts action requests that result in a database query. A response belongs to a particular layer and may have a parent response (when nested). It has also a duration and a status (processing/suceeded/failed/…). Actually it can hold any context, so a monitor can store whatever relevant data is
needed (e.g., this can be useful for a custom metric that your
application wants to track). - get/setLayer, get/setParent(Response), duration, status (ok/failed/processing..),
- glassbox.response.(Default)ResponseFactory.aj
- This is a helper class for manipulation requests including their creation while taking care about their proper nesting and setting their start/end times. It uses a thread local stack to keep track of nested requests and System.currentTimeMillis() for timing.
- As noticed elsewhere, its used by monitors to create/finish Responses and produces the appropriate events for that and also manages a list of ResponseListeners.
Monitor API
The monitoring aspects extend this API and it also includes many specific monitoring aspects such as EjbCallMonitor.aj and StrutsRequestMonitor.aj.
- glassbox.monitor.OperationFactory – create OperationDescription(Impl) from JSP path or from a class name – see e.g. MvcFrameworkMonitor.aj
- glassbox.monitor.AbstractMonitorClass
- isEnabled (calls RuntimeControl.aspectOf(this).isEnabled()), setEnabled, setThisThreadEnabled, …; failureDetectionStrategy (recordException: failureDetectionStrategy.getFailureDescription(throwable)); getLayer();
- accesses & modifies responseFactory.getLastResponse() – e.g. in endNormally (-> response.complete()), endException(); begin(key, layer) -> createResponse
- Ron’s note: One reason for having AbstractMonitorClass is to allow using Java-5
annotation-based aspects with the Glassbox framework, either for
AspectJ extensions written in that style or for Spring annotation-based
aspects.
Tracking API
An addition to the Response API to keep track of requests (monitored methods) etc.
- glassbox.track.api.Request – represent a specific instance of a request to something, i.e. an invocation of a monitored method. They can be compared based on elapsed time.
- glassbox.track.api.FailureDetectionStrategy – shall an exception thrown by a monitored method be regarded as its failure or not?
- glassbox.track.api. Call/Failure/Operation/SQLFailure Description – OperationDescription has a type (e.g. "HttpServlet"), a name (e.g. the servlet’s name), context (e.g. the web app’s context root) and perhaps a parent OperationDescription if nested.
- glassbox.track.api.SlowRequestDescriptor – describes a request whose processing was too slow; it has among others the attributes StackTraceElement slowestTraceElement and mean/slow/total counts.
- glassbox.track.api.UsageTrackingInfo – attributes eventTime, eventCpuTime, eventUserCpuTime (uses ThreadMXBean)
Analysis API
Used by the framework and UI to analyse the monitoring data and present an aggregated view to the user. These are mostly only value objects while the logic is in the Agent API.
- glassbox.analysis.api.TimeDecomposition – Captures mutually exclusive breakdown of overall time by component/resource. Components: dispatch (in common code above operation), other (other, undefined areas), db access, remote calls; resources: running java code, running native code, waiting (I/O…), thread contention.
- glassbox.analysis.api.OperationAnalysis – TimeDecomposition getComponentDecomposition() (db, cpu, i/o, dispatch…), TimeDecomposition getResourceDecomposition() (by thread use: runnable, blocked, waiting, etc.); getSlowThresholdMillis(); getMeanCpuTime() …; isFailing(), isSlow();
- glassbox.analysis.api.SummaryStats – aggregated statistics for a monitored method – its accumulatedTime, count (number of hits), mean time
Agent API
Collect data from monitors, summarize it, analyze problems, and provide the outputs to the Web UI.
- glassbox.client.persistence.jdbc.BackupDaemon – stores agent connections (but not any monitored data) into a database (by default an embedded hsqldb – see the myDataSource below).
- glassbox.monitor.thread.ThreadMonitor15Impl: This monitor periodically grabs thread dumps for all threads that are processing user requests. It runs in a daemon thread collecting the dumps in preset intervals. Creates instances of glassbox.monitor.thread. OperationSample when sampling a monitored thread. When a monitored thread finishes, it results perhaps in a call to ThreadSummarizer.summarize, which updates the assoc. CompositePerfStats.
- glassbox.monitor.thread.ThreadMonitorIntegration.aj: starts a ThreadMonitor after StatsSummarizer.startTopLevelStats|startNestedLaterOperation
- glassbox.summary.StatsSummarizer.aj (implements ResponseListener): ResponseFactory invokes its startedResponse/finishedResponse when appropriate; startedResponse => update ThreadStats including StatisticsRegistry, invoke startedStats which may results in starting a ThreadMonitor
- uses glassbox.track.api.StatisticsRegistry stored in a thread local variable of the type ThreadStats together with first/last operation key (OperationPerfStats)
- glassbox.track.OperationTracker (singleton): used by GlassboxServiceImpl to analyze/list/… operations. Holds a global StatisticsRegistry registry and an OperationAnalyzer.
- glassbox.analysis.OperationAnalyzer: Analyses the collected statisticts to detect problems and their causes. It uses (Default)TimeDecomposition.It also makes OperationSummaries from the stats further used by the UI.
- Note: in agent.jar the Spring config file beans.xml defines a bean operationTracker of the type glassbox.track.OperationTrackerImpl (implements OperationTracker, StatisticsRegistry) – this is used to collect all the stats in the monitored app. It’s used by the bean glassboxService (glassbox.agent.control.GlassboxServiceImp).
- glassbox.agent.control.GlassboxServiceImpl (singleton): its listOperations() (delegation to OperationTrackerImpl.listOperations()) is invoked using a local/remote call from the glassbox UI webapp (OperationHelper) to collect operations (stats) from the given server and it also provides problem analysis to the UI in a similar manner.. I suppose that there is only a single instance of this class in a JVM.
UI Web App’s API
The Spring configuration file glassbox.war/WEB-INF/applicationContext.xml defines among others the following beans:
- a backupDaemon – regarding its function see glassbox.client.persistence.jdbc.BackupDaemon above. See also glassbox.client.persistence.jdbc.PersistChanges. It’s schedule is def. ibidem by scheduledTask with the default period of 10000 and it uses indirectly myDataSource defined there as well.
- Note: To override the data source used to store client configuration
including remote connections to open, you can define the System
property glassbox.config.ds (this is configured in
the applicationContext.xml). - agentManager (glassbox.client.remote.DistributedAgentManager).
- glassboxService (org.springframework.remoting.httpinvoker.HttpInvokerProxyFactoryBean).
Classes of interest:
- glassbox.client.helper.OperationHelper – updates statistics to display by calling listOperations on the remote agents.
- glassbox.client.pojo.OperationData – monit. data collected and used by the UI It only adds source and agent identification to its nested OperationSummary, holding the actual statistic (operation count, is failing, is slow, avg time; nested OperationDescription that might have a parent too).
Customizing Glassbox for your webapp
There’re two ways of customizing Glassbox for monitoring of a particular webapp:
- Adding monitors
- Glassbox plugins mechanism for advanced customization.
1. Adding monitors
You can apply an existing monitoring aspect to a new method using AspectJ weaving rules described in an aop.xml file thus turning the method into a monitored method or you can even create a new monitoring aspect extending a base Glassbox aspect, putting it – perhaps together with an aop.xml – on the monited webapp’s classpath.
Very valuable and inspiring information about this are in the aforementioned presentation Glassbox Architecture & Design.
Coding-less addition of a monitor
Quoting the User Guide:
——–
You can simply extend the Glassbox
definition of operations by creating a new XML file with these
contents:
<aspectj>
<aspects>
<concrete-aspect name="ServiceProcessingMonitor"
extends="glassbox.monitor.ui.TemplateOperationMonitor">
<pointcut name="methodSignatureControllerExecTarget"
expression="within(com.myco.service..*)"/>
</concrete-aspect>
</aspects>
</aspectj>
Note: If you only want to monitor another method invoked during processing of something already regarded as an operation you should rather extend the glassbox.monitor.MethodMonitor.
You can then add this file to a META-INF subdirectory of a
directory on your classpath or add a jar containing the file at the
location META-INF/aop.xml. For Tomcat, you might just create the
directory common/classes/META-INF and install your custom aop.xml
file there.
Implementing a new monitor
Warning: This is only my idea what is necessary to be done and may contain mistakes and false ideas.
To implement a new monitoring aspect you should do at least the following:
- Extend glassbox.monitor.AbstractMonitor
- Define the pointcuts (i.e. to what methods to apply this monitor)
- Redefine the abstract pointcut monitorEnd() to apply to the monitored method(s) so that the parent class detects when it finishes.
- Either redefine tha abstract pointcut monitorBegin(Object identifier) to allow the parent class to automatically register the beginning of the monitored operation or define your own advice (method) that is run when a custom pointcut is encountered; usually this is a before advice. The identifier should be e.g. an OperationDescription. Inside a custom advice:
- (Re)implement some methods such as getLayer().
- Create an OperationDescription for the operation, likely using the inherited operationFactory.
- Call one of the inherited begin(..) methods (see AbstractMonitorClass.aj), passing the OperationDescription as the 1st argument, i.e. as a key. This will return a glassbox.response.Response object.
- Store some context data into the generated response, using e.g. response.set(Response.PARAMETERS, <the monitored method’s arguments from AspectJ’s thisJoinPoint.getArgs()>).
Note: If you have it override getLayer() to return Response.RESOURCE_SERVICE and the monitored method is slow then Glassbox will report it as a slow remote call.
2. Glassbox plugins mechanism
It’s possible to extend Glassbox with application-specific extensions using the API glassbox.config.extension.api. Simply adding monitors doesn’t need it: you can just deploy a monitor jar with aspects to the classpath, and an app can simply call on the response API. The PluginRegistry supports deeper extensions, like adding custom operations (see interface glassbox.config.extension.api.OperationPlugin).
Ron explains (2008-10-25): Glassbox lets you customize a variety of facets using plugins.
Operation plugins let you add an operation type that can extend how
Glassbox summarizes and analyzes the operation (to detect service level
violations) and how the UI renders these. Glassbox plugins also let you
define a connection provider so you can write custom code to define
what connections in a cluster/server farm should be opened (allowing
discovery instead of manual configuration). Most recently I’ve also
added a runtime controller that lets you change behavior at runtime
(e.g., requesting that a request on a specific thread be monitored).
Operation plugins really shine if you need to add in different service levels, or
want a custom display to summarize problems (e.g., Ron used this to
detect an out of date cache for one custom project).
Hopefully some documentation for creating plugins will be created soon.
3. Restricting what to monitor
Currently Glassbox monitors any web application deployed on the app. server including itself and also some common code like JDBC drivers (which may be invoked not only from a web application but also by the server’s daemons etc.). If you don’t want to monitor all of that, for instance to decrease the overhead and to make the outputs easier to read, there are few things you can do.
Glassbox yet doesn’t support any filtering but you can:
- You can avoid instrumentation of specific
applications on the server by deploying a META-INF/aop.xml file in
their classpath that disables weaving into any classes (although that
would still track calls to common classes like JDBC drivers and app
server servlets).
Example (MyApp.war/WEB-INF/classes|MyApp.ear)/META-INF/aop.xml:
<aspectj> <weaver> <exclude within="*"/> </weaver> </aspectj> - Instead of load-time weaving (LTW) you can perform offline weaving of Glassbox aspects only into the code of the web applications that interest you and perhaps into some common server code that you also need to be monitored, for instance a JDBC driver. An additional benefit is that you’ll get rid of the longer class loading at server startup.
A word about memory overhead
According to the aforementioned presentation of Glassbox design & architecture it can consume roughly 20% more memory than without Glassbox. Ron further explains:
20% is a rough guideline – it varies quite a bit in specific cases. The
biggest area where Glassbox adds overhead is indeed from AspectJ
load-time weaving, most specifically the memory overhead from handling
JSP’s – the load-time weaving system uses a little more than 1 megabyte
of memory for each loader, and each JSP gets its own loader. The
AspectJ project has been working on this area – see Reducing weaver memory usage over time.
We need to merge in the updated work on AspectJ to the version of
AspectJ we’re using with Glassbox, which also reduces memory overhead a
lot. You could try the patch in that bug report with AspectJ 1.6.1′s
aspectjweaver.jar to see if it has better memory performance.
Another approach that might be simpler and yet could help a lot is a
hybrid one: if you just precompile your application’s JSPs you will see
far lower overhead.
Glassbox does work best if it is weaving some server classes, but in
many cases you can get the desired visibility if you do offline weaving
of your application and a few key libraries, like your JDBC driver and
e.g., web services callers. If you want to try that, I’d be glad to
help.
There is one other area where Glassbox can consume significant memory:
it records statistics based on the structure of components and
resources and how calls are nested. For a fairly static application
this is normally constrained and quite small, but some applications
generate names/queries/etc. dynamically and Glassbox can build
increasingly large trees of statistics, which consume memory also. We
definitely want to address this area – I’m leaning towards not
recording details for quick operations, and only recording information
for things that run often and are taking noticeable time. We’d like to
know about cases where this happens so we can test better approaches.
-
Update 2009-01-21
I’ve implemented persistence for the detailed monitoring data of Glassbox. Unfotunately Glassbox generates too many uninteresting entries and for the methods of interest it doesn’t provide enough data, you can read more about this in the glassbox forums linked to above. I had unfortunately no time to try to deal with these problems.
If you’re anyway interested in the DB persistence, you can try it – download GlassboxDbPersister.zip and read the contained README.txt.
Resources
-
A nice article about Performance Monitoring using Glassbox (03-03-2009) with many screenshots and some code samples.
Posted in Java, Tools | Tagged: AOP, java, monitoring, performance | 7 Comments »

