The Holy Java

Building the right thing, building it right, fast

Performance & Performance Testing for Webapps

! work in progress !

Key Performance Goals

Requirement format: “X should be less than L in P % times when the load is U users”

  • Request throughput
  • Latency
  • Max/avg response time from the end user’s point of view

Performance Metrics

The Utilization Saturation and Errors (USE) Method

The USE Method of performance analysis focuses on getting a complete overview of a system (without forgetting anything) and discovering quickly most of the performance problems. The main tool is a system-dependant checklist of resources and metrics of utilization, saturation (i.e. work that has to wait), and errors for each resource. Read more on its page, which also contains checklist for some systems.

Java EE App Stress Testing (A. Bien)

Goal: Test contention (=> dead/live-locks), transaction isolation, caching behavior, consistency, robustness, performance, memory consumption.

  • Memory, current heap size
  • Typical / peak # of worker threads
  • Usual depth of the request queue
  • # rolled back transactions (f.ex. due to optimistic locking) vs. successful ones
  • # requests/sec
  • # & length of major garbage collections
  • # DB connections
  • Size of JPA caches
  • all of these should be stable, i.e. not grow (too much) with time / growing load.

Ex.: JMeter + VisualVM (with the MBean plugin to monitor custom caches and with Visual GC) to observe the behavior live.

Level: Browser

TBD (network latency, rendering time, … – use FireBug’s timing capability or some similar browser plugin)

Level: Server

Resource utilization (from New Relic docs):

  • Cpu busy [%] – the percentage of the time that the system is using the CPU
  • Disk busy [%] – the percentage of the time that the system is performing Disk IO
  • Memory used [%]
  • Disk space used [%]
  • Network utilization [Mb/s]
  • Drilling down into processes – their count, CPU, memory

Layer: Servlet Container / EJB Container

TBD (JVM heap, threads, …)

  • Servlet Container
    • # threads (unless NIO used, i.e. # concurrent requests being processed)
  • Database
    • No. free connections in connection pools
    • Avg. time connection is used by a thread (how long away from the pool)
  • EJB Container
    • Bean pool utilization

Layer: Application

TBD (# concurrent users, # errors/exceptions, …)

Layer: Database


Common Performance Issues


(I don’t remember the resource for this😦 )

  • No more File Descriptors
    • • symptoms: entry in error log, new httpd children fail to start, fork() failing everywhere
    • • solution: increase system-wide limits, incr. ulimit via apachectl
  • Sockets stuck in time_wait
    • • sympt.: unable to accept new conn., CPU under-utiliz. & httpd proc. idle, not swapping, netstat shows # sockets in time_wait
    • • many t_w are to be expected, only a problem when new conn. failing => decrease sys-wide TCP/IP FIN timeout
  • High Mem Usage (swapping)
    • • sympt.: (ignore system free mem, misleading): # disk activity, top/free show high swap usage, load gradually increasing, ps shows processes blocking on disk i/o
    • • sol.: add mem, …
  • CPU overload
    • • sympt.: top shows little/no cpu idle time, *not* swapping, high load, much cpu spent in userspace
    • • sol.: add cpu
  • Interrupt (IRQ) overload
    • • sympt: (freq. on 8+ cpu machines) not swapping, 1-2 cpu busy rest idle, low total load
    • • sol.: add NIC

Doing Performance Testing Correctly

Jetty on Load Testing

The Jetty High Load Howto has some good tips on creating realistic load testers and on configuring the load testing and server machines (TCP buffer sizes, in/outbound connection queue size, # file, # ports, congestion control). F.ex.:

– A common mistake is that load generators often open relatively few connections that are kept totally busy sending as many requests as possible over each connection. This causes the measured throughput to be limited by request latency (see Lies Damned Lies and Benchmarks for an analysis of such an issue.

– Another common mistake is to use a TCP/IP for a single request and to open many many short lived connections. This will often result in accept queues filling and limitations due to file descriptor and/or port starvation.

A load generator should well model the traffic profile from the normal clients of the server. For browsers, this if mostly between 2 and 6 connections that are mostly idle and that are used in sporadic bursts with read times in between. The connections are mostly long held HTTP/1.1 connections.

It recommends the Cometd Load Tester for a good example of a realistic load generator


Simple tools

Web servers

  • ab (Apache Benchmark) – ex: ab -n 10000 -c 250  <page URL> – generate 10k GETs for the URL, issuing 25o in parallel (limited by the number of sockets a process can open on the test machine)
  • siege – http/https stress tester – available in the software repositories of most Linux distros
  • wrk – a HTTP benchmarking tool wrk is a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.


  • MySQL: mysqlslap – simulates a number of clients connecting to the DB and performing a query of your choosing

On-line tools

  • – load-testing service based on curl-like tool, very easy to use and with nice graphs, can simulate tens of thousands of concurrent users from different location on the Earth
  • Pingdom – “uptime and performance monitoring made easy” – I haven’t tried this

Apache JMeter


Max Number of Concurrent Users

The number of concurrent virtual users JMeter can efficiently simulate depends on the resources of the test machine (memory, thread limits, socket limits, network card speed, …), the complexity of the test, and other load on the system. In general it’s recommended to use 1000 or less threads – of course assuming that you don’t perform any extensive report gathering/rendering and reduce JMeter resource consumption as it would steal resources available for the testing. With bad configuration or machine you can experience problems already with a much lower thread count.

Notice also that if you don’t introduce any “think time” into your test plans than a single JMeter thread can generate much higher load than a human user could and thus a single thread can correspond to e.g. ten humans.

If you need more virtual user then you need multiple JMeter instances (preferably on multiple machines) in the master-slave configuration or as independent instances, compiling the individual reports yourself, if the overhead of master-slave communication is unacceptable for you.

Check this blog for some tips (2009).

From the Gatling stress test tool docs (referring to JMeter 2.5.1):

JMeter creates one thread per user simulated. If there is not enough memory allocated to the JVM, it can crash trying to create these threads. For instance, JMeter could not run 1500 users with 512 MB (what was used for Gatling even with 2000 users); OutOfMemoryErrors are recorded in the table as OOM.

Another problem occurred with the 2000 users simulations; it seems that JMeter can not simulate more than 1514 users independently from the memory that was allocated to the JVM.


Gatling is a new (2012?) stress test tool, written in Scala and using Akka. Tests are described by a fluent API in a “text” or richer scala format. It claims high efficiency (2000 users simulated where JMeter couldn’t handle over 1500 and with much lower memory consumption of 512M). So far I haven’t noticed anything about distributed testing (certainly needed for 10s of thousands of users).



SysBench is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load.” Last release 2004.

Disk: hdparm -t, bonnie++, iozone

See the blog post Disk IO and throughput benchmarks on Amazon’s EC2 (2009) for examples og use.

DBT-{1-5} – The Database Test Suite

DBT-* is a suite of database tests: DBT-1TM (Web Server)  simulates the activities of web users browsing and buying items, DBT-2TM is an OLTP transactional performance test, DBT-3TM is decision support workload (business oriented ad-hoc queries and concurrent data modifications), DBT-4TM is an application server and Web services workload, DBT-5TM is an OLTP workload simluating the activities of a brokerage firm.

For ex. Xeround used it to compare its Cloud Database with Amazon RDS (7/2011?).

Web page performance testing



Web page performance



Performance Tips

HTTP Caching
Google devs: Optimize caching
Key tips: Set one “strong” (unconditional) caching header – Cache-Control: max-age=N [sec] (or Expires) – and one “weak” (conditional, checked for updates) – ETag (fingerprint/hash) or Last-Modified. Set Cache control: public directive to enable caching by HTTP proxies (and HTTPS caching for Firefox) – but make sure it does not set any cookies as most proxies would not cache it anyway in that case. Notice that many proxies do not cache resources with query params.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: