Joshua Bloch: Performance Anxiety – on Performance Unpredictability, Its Measurement and Benchmarking

Joshua Bloch had a great talk called Performance Anxiety (30min, via Parleys; slides also available ) at Devoxx 2010, the main message as I read it was

Nowadays, performance is completely non-predictable. You have to measure it and employ proper statistics to get some meaningful results.
Microbenchmarking is very, very hard to do correctly. No, you misunderstand me, I mean even harder than that! 🙂
From the resources: Profiles and result evaluation methods may be very misleading unless used correctly.

There have been another blog about it but I’d like to record here more detailed remarks.

Today we can’t estimate performance, we must measure it because the systems (JVM, OS, processor, …) are very complex with many different heuristics on various levels and thus the performance is highly unpredictable. This doesn’t apply only to Java, but also to C, C++, even to assembly code.

Example: Results during a single JVM run may be consistent (warm-up, then faster) but can vary between JVM executions even by 20%. One of the causes may be Compilation Planning (what’s inlined, …) – it’s done in a background thread and thus is inherently non-deterministic.

Therefore don’t estimate but measure and not only that – also do statistical processing of the data (how often diff. values appear, what they are, … – mean, median, standard deviation etc.).

“Profiles don’t help much; in fact, they can mislead” – Mytkowicz, Diwan etc. – “Evaluating the Accuracy of Java Proﬁlers”, PLDI ’10 – in their experiment, each of 4 leading profiles identified a different hotspot. I’d really recommend you reading the related StackOverflow discussion “If profiler is not the answer, what other choices do we have?” (the answer is: profilers have their value, but use the correct ones and use them correctly). The conclusion of the original paper:

Our results are disturbing because they indicate that proﬁler incorrectness is pervasive—occurring in most of our seven benchmarks and in two production JVM—-and signiﬁcant—all four of the state-of-the-art proﬁlers produce incorrect proﬁles. Incorrect proﬁles can easily cause a performance analyst to spend time optimizing cold methods that will have minimal effect on performance. We show that a proof-of-concept proﬁler that does not use yield points for sampling does not suffer from the above problems.

“Benchmarking is really, really hard!” and “Most benchmarks are seriously broken“. Broken means that either the measurement’s error is higher than the value being measured or that the results obtained are unrelated to intended measurements. It seems that it is actually really hard to find a (micro)-benchmark, which isn’t broken. Joshua recommends Cliff Click’s JavaOne 2009 presentation The Art of (Java) Benchmarking (see also an interesting related interview with Cliff), which I belive to have seen and which points out the various traps here. Joshu also mentiones that some frameworks, such as Google Caliper may help you to avoid the pitfalls, though I’m quite sure they can’t protect you from all.

Joshua mentiones a couple of interesting papers, you should check the slides for them. One which sounds really interesting to me is by Georges, Buytaert and Eeckhout – Statistically Rigorous Java Performance Evaluation, OOPSLA07 (20 pages). They mention there that you need to run VM 30 times to get meaningful data. From the abstract:

This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.

Personal touch

I find this subject very interesting because for over a year I’m involved in performance optimization of one of our data feeds, which used to run for couple of days (latest results: 1/2h [with a bit of cheating]). My experience completely supports what Joshua says – don’t guess but measure, profilers may be misleading, performance is unpredictable. Though as a collegue mentioned, in the domain of enterprise Java, our performance problems are usually caused by the database and communication with it (which 100% applies to that feed too).

I’ve already blogged about some experiences, e.g. in The power of batching or speeding JDBC by 100 (inspired by JDBC performance tuning with fetch size), check also the performance tag for interesting links. I also appreciated and applied the knowledge from Accurately computing running variance (I often wish I have slept less and paid attention more during the uni math lectures :-)).

Conclusion

The higher complexity, the higher unpredictability =>

As an application programmer, use high-level, declarative constructs where posible to push the responsability for performance one level down to library and JVM authors, who should know better.
Measure repeatedly and process the results with proper statistics. Don’t forget to repeat them over time, the platform evolves with every release.

Once again, microbenchmarking is hard! 🙂 If you have to play with it, use something like Caliper and be aware that your results are most likely wrong anyway.

Closing words: Merry Christmas!

6 thoughts on “Joshua Bloch: Performance Anxiety – on Performance Unpredictability, Its Measurement and Benchmarking”

Jason says:

December 13, 2010 at 11:48 pm

When I first started doing performance analysis the accepted wisdom was that, regardless of language, profile data had a lot of fiction in it. Your job was to sift out the fact through some loose interpretation of the Scientific Method (conjecture, test, reproduce, etc).

Somewhere along the way in the growth of Java, it became impolite to mention this particular detail in public. If you got a couple functioning members of the optimization community in a room (you know, people who could actually fix things instead of making them worse), you could get them to agree this was still the case, but people just stopped acknowledging it unless pressed (well, except those who insisted – angrily – that this was not the case). It didn’t feel like a conspiracy so much as a false sense of security.

Over the years I’ve consistently, if in sometimes subtle ways, observed modest code tweaks cause inexplicable improvements to application speed. Every time it happens, I get more proof that the statistics always lie, and it encourages me to keep looking under rocks that others have abandoned.
1. theholyjava says:
  
  December 14, 2010 at 11:10 am
  
  Thanks for sharing!
williamlouth says:

February 2, 2011 at 1:47 pm

Jakub you should really check out my performance blog on matters that you called out in your blog – in particular the accuracy of sample based profilers.

http://williamlouth.wordpress.com/

I also have a very good slide set and video recorded at Google HQ on lightweight Java profiling and tuning.

http://opencore.jinspired.com/?p=1550
1. theholyjava says:
  
  February 3, 2011 at 8:57 pm
  
  Thanks for the tip, William!
Brent Boyer says:

February 7, 2011 at 4:17 pm

Jakub, if you are seriously interested in Java benchmarking, then read this article:
http://www.ibm.com/developerworks/java/library/j-benchmark1.html
https://www.ibm.com/developerworks/java/library/j-benchmark2/

The benchmarking framework available here
http://www.ellipticgroup.com/html/benchmarkingArticle.html
covers all the issues you discuss above (and more–read the article!).
theholyjava says:

February 9, 2011 at 11:22 pm

Hi Brent, thanks a lot for the articles & framework! I hope I’ll have time to read them earlier or later (I’m quite busy after having moved to Norway).

Comments are closed.

Personal touch

Conclusion

Related

Published by Jakub Holý

6 thoughts on “Joshua Bloch: Performance Anxiety – on Performance Unpredictability, Its Measurement and Benchmarking”