Friday, December 17, 2010

What is faster? JVM Performance

Java performance optimization is part analysis, and part superstition and witch craft.

In deciding how to optimize their applications a lot of developers search the web for what people say is faster, or go by what they have heard from their co-workers sister's boyfriend's cousin, or by what seems to be the "public opinion". Sometimes the public opinion is correct, and their application benefits, and sometimes it is not, and they waste their time and effort and make little performance improvement, or make things worse.

True performance optimization involves measuring the current performance. Profiling the application and determining the bottlenecks. Investigating and implementing optimizations to avoid the bottlenecks.

But how should one code on a day to day basis for optimal performance? Are synchronized methods slow? Are final methods fast? What is faster Vector, ArrayList or LinkedList? What is the optimal way to iterate a List? Is refection slow? I will attempt to answer these questions in this post.

In the EclipseLink project we are generally very concerned about performance, as we are a persistence library used by other applications, so like the JDK we are part of the plumbing and need to be as optimized as possible.

We have a number of performance test suites in EclipseLink, mainly to measure our persistence performance, but we have one test suite that has nothing to do with persistence. Our Java performance test suite just measures the performance of different operations in Java. These tests help us determine how to best optimize our code.

The tests function by running a certain operation for a set amount of time and measuring the number of operations in the time period. Next the next operation is run and measured the same way. This is then repeated 5 times, and the max/min values are rejected, the average of the middle 3 is computed, the standard deviation is computed, and the averages of the two operations are compared. Tests are single threaded.

The tests were run on Oracle Sun JDK 1.6.0_07 on 32 bit Windows.

Maps

There are several different Map implementations in Java. This test compares the performance for various sizes of Maps. The test instantiates the Map, does n (size) puts, then n gets and n removes.

Map Operation Performance Comparison

MapSizeAverage (operations/10 seconds)%STD%DIF (with Hashtable)
Hashtable1036052350.03%0%
HashMap1029088540.02-23%
LinkedHashMap1025944920.03%-38%
IdentityHashMap1013462780.01%-167%
ConcurrentHashMap1010092590.0%-257%
HashSet1029272540.02%-23%
Hashtable1003572290.01%0%
HashMap1002745870.03%-30%
LinkedHashMap1002698400.03%-32%
IdentityHashMap1001108010.02%-222%
ConcurrentHashMap1001190680.01%-200%
HashSet1002819600.04%-26%
Hashtable1000340346.2%0%
HashMap1000278180.06%-22%
LinkedHashMap1000255140.03%-33%
IdentityHashMap1000116500.04%-192%
ConcurrentHashMap1000124202.9%-174%
HashSet1000268880.04%-26%

These results are quite interesting, I never would have expected such a big difference in performance between classes doing basically the same well defined thing, that has been around for quite some time. It is surprising that Hashtable performs better than HashMap, when HashMap is suppose to be the replacement for Hashtable, and does not suffer from its limitation of using synchronized methods, which are suppose to be slow.  Note: After exploring other JVMs in my next post, it seems that this anomaly only exists in JDK 1.6.7, in the latest JVM, and most other JVM, HashMap seems to be faster than Hashtable.

I would expect LinkedHashMap and ConcurrentHashMap to be somewhat slower, as they have additional overhead and perform better in specific use cases, but it is odd that IdentityHashMap is so much slower, given my test object did not define a hashCode(), so was using its identity, so the map was doing the identical thing as HashMap and Hashtable. Also interesting that HashSet has the same performance as HashMap, given it in theory could be a simpler data structure (in reality it is a subclass of HashMap, so performs the same).

Note that these are single threaded results. Although Hashtable outperformed HashMap in this test, in a multi-threaded (and multi-CPU) environment, Hashtable would perform much worse because of the method synchronization. Assuming read-only access of coarse, as concurrent access to a HashMap would blow up. ConcurrentHashMap does perform the best in a concurrent read-write environment.

Lists

For Maps, Hashtable turned out to have the best performance. Lets now investigate Lists, will the old Vector implementation have better performance than ArrayList? The list test instantiates a list, then does n (size) adds followed by n gets by index.

List Operation Performance Comparison

ListSizeAverage (operations/10 seconds)%STD%DIF (with Vector)
Vector10116254530.006%0%
ArrayList10184484530.08%+56%
LinkedList10123622900.03%+6.0%
Vector10012414200.2%0%
ArrayList10018935810.1%+52%
LinkedList10010127400.01%-22%
Vector10001321820.2%0%
ArrayList10002239690.1%+69%
LinkedList1000126890.02%-941%

So, it seems that ArrayList is much faster than Vector. Given both Vector and Hashtable are synchronized, I assume it is not the synchronized methods, just the implementation. LinkedList surprisingly performs as good as Vector for small sizes, but then performs much worse on larger sizes because of the indexed get. This is expected as LinkedList performs good in specific use cases, such as removing from the head or middle, but this was not tested.

Iteration

There are many way to iterate a List in Java. For a Vector a Enumeration can be used, or an Iterator in any List. A simple for loop with an index can also be used for any List. Java 5 also defines a simplified for syntax for iterating any Collection. This test compares the performance of iteration, the List has already been pre-populated.

Iteration Performance Comparison

ListAverage (operations/10 seconds)%STD%DIF (with Vector)
for index (Vector)60792720.06%0%
for index (ArrayList)60483240.03%-0.5%
Enumeration (Vector)47360490.02%-28%
Iterator (Vector)28400940.05%-114%
Iterator (ArrayList)19351220.008%-214%
for (Vector)28415670.05%-113%
for (ArrayList)19335760.04%-214%

So using a for loop with an index is faster than an Enumerator or Iterator. This is expected, as it avoids the cost of the iteration object instance. It is interesting that a Enumerator is faster than an Iterator, and a Vector Iterator is faster than an ArrayList Iterator. Unfortunately there is no magic in the Java 5 for syntax, it just calls iterator().

So the optimal way to iterate a List is:

int size = list.size();
for (int index = 0; index < size; index++) {
    Object object = list.get(index);
    ...
}

However, the Java 5 syntax is simpler, so I must admit I would use it in any non performance critical code, and maybe some day the JVM will do some magic and for will have better performance.

Method Execution

A method in Java can be defined in several different ways. It can be synchronized, final, called reflectively, or not called at all (in-lined). This next test tries to determine the performance overhead to a method call. The test executes a simple method that just increments an index. The test executes the method 100 times to avoid the test call overhead. For the reflective usage the Method object and arguments array are cached.

Method Execution Performance Comparison

MethodAverage (100 operations/10 seconds)%STD%DIF (with normal)
Normal255331300.7%0%
synchronized133837074.3%-93%
Block synchronized82440874.7%-203%
final268738731.2%+6.1%
In-lined268161091.5%+5.2%
volatile15037270.1%-1539%
Reflection1590693.2%-15646%

Interesting results. Synchronized methods are slower, and using a synchronized block inside a method seems to be slower than just making the method synchronized. Final methods seem to be slightly faster, but very minor, and seem to have the same improvement as in-lining the method, so I assume that is what the JVM is doing.

Calling a method through reflection is significantly slower. Reflection has improved much since 1.5 added byte-code generation, but if you profile a reflective call you will see several isAssignable() checks before the method is invoked to ensure the object and arguments are of the correct type, it is odd that these cannot be handled through casts in the generated code, or typing errors just trapped.

CORRECTION
Originally volatile was showing an improvement in performance because of a bug in the volatile test. After correcting the bug the test now shows a huge overhead in performance. The usage of volatile on the int field that is being incremented in the test method is causing a 15x performance overhead. This is surprising, and much larger than the synchronized overhead, but still smaller than the reflection overhead. This is especially odd as the field is an int which one would think is atomic anyway. I imaging the affect the volatile has on the JVM memory usage is somehow causing the overhead. In spite of the overhead, I would still use volatile when required, in concurrent pieces of code where the same variable is concurrently modified. I have done multi-threaded tests comparing its concurrency with synchronizing the method, and using volatile is better than using synchronization for concurrency.

Note that this difference in cost of execution is on a method that does basically nothing. Generally, performance bottlenecks in applications are in places that do something, not nothing, and it is the something that is the bottleneck, not how the method is called. Even in the reflective call, the cost was less than 1 millionth of a second, so on a method that toke 1 millisecond the overhead would be irrelevant. Also, I assume these results are very JVM specific, other JVMs, older JVMs, and the JVMs of the future may behave much differently.

Reflection

There are two types of reflection, field reflection and method reflection. In JPA this normally corresponds to using FIELD access of PROPERTY access (annotating the fields or the get methods). Which is faster? This test sets a single variable, either directly, by calling a set method, or through field or set method reflection.

Reflection Performance Comparison

TypeAverage (operations/10 seconds)%STD%DIF (with in-lined)
In-lined441057300.7%0%
Set method461896381.2%+4.7%
Field Reflection234889871.4%-87%
Method Reflection108626840.8%-306%

So its appears that in JDK 1.6 field reflection is faster than set method reflection. Part of this difference is that for method reflection an Object array must be created to call the set method, where as field reflection does not require this. Method reflection improved a lot in JDK 1.5, but it still seems slower than field reflection, at least for set methods. Both have an overhead over a compiled method execution or direct variable access.

If you look at the results the difference between reflection and normal execution is much less than in the previous test. This is because in the previous test the method was executed 100 times inside the test, so the overhead of calling the test method was reduced, where as this test only called it once, so had the test method call overhead in it. This highlights an important fact, that how you call the method that sets the field has a big impact on the performance. If you use a complex layering of interfaces and generated code to call a set method directly, versus calling it reflectively, the overhead of the layering may be worse than the reflective call.

In EclipseLink field reflection is used by default, but it depends if you annotate your fields or get methods. If you annotate your get methods, then method reflection is used. If you use weaving (enabled through either, the EclipseLink agent, static weaving, JEE, or Spring) and use FIELD access, then EclipseLink will weave in a generic get and set method into your classes to avoid reflection. This amounts to a minor optimization in the context of database persistence, and can be disabled using the persistence unit property "eclipselink.weaving.internal"="false".

Summary

So this post has presented a lot of data on how different things perform in Oracle Sun JDK 1.6 on Windows. It would be interesting to see how the same tests perform in different JVMs in different environments. Perhaps I will investigate that next.

The source code to the above tests is in the EclipseLink SVN repository,
here

23 comments :

  1. Interesting results... In the Reflection Performance Comparison, the Set method is slightly faster than In-lined -- this seems odd to me, can you comment on why this is?

    Also, the JVM you tested with (1.6.0_7) is relatively old now -- was there a reason you didn't use the latest (1.6.0_22 I think)?

    ReplyDelete
  2. These seem to be not very interesting micro benchmarks. Sort of like comparing the performance of different kind of hammers at hammering in framing nails. I expect a framing hammer would do a better job than a drywall hammer. Each kind of hammer is more useful for different kinds of nails and jobs.

    ReplyDelete
  3. @dave - Yes, I thought this was odd as well. I would guess it is just variance, notice the % standard deviation was 1.4% and the cost of calling the test method, versus it calling the set method is pretty marginal. It would have been better to put the set call in a 100 loop like I did for the method execution test, but here I was more interested in comparing field vs method reflection, not in-line versus set method, as the previous test already did that.

    I used 1.6.0_7 for no particular reason, it was just what a had installed. That latest JDK version (today) is 1.6.0_23, I could re-run using it, I would not expect a big difference between patch releases, but would expect I big difference in different JVMs or difference JVM versions. I will try to get some results on different JVMs.

    ReplyDelete
  4. Hi James,

    Actually JDK6u23 is a big change at the VM level when compared to JDK6u7. Sun (and now Oracle) have decoupled HotSpot's release cycle from the JDK and new versions have been added throughout the cycle (with major new features like compressed references and scalar replacement in some of them).

    A new HotSpot was integrated in JDK6u4, JDK6u10, JDK6u14, JDK6u18, JDK6u21 and JDK6u23 (if I recall correctly).

    One other comment, HashMap is generally used for JDK code, so it's quite strange that HashTable does better. I'd check to make sure that the test doesn't have issues.

    Best,
    Ismael

    ReplyDelete
  5. @Ismael - Thanks for the info, I will try to check if the latest JDK has any different behavior.

    I double checked the HashMap test code, and it seems valid. I have not profiled the test, so cannot say for sure why it has worse performance. One difference is that Hashtable uses the size directly, where as HashMap uses the closest power of 2, so would use 16 instead of 10, which may be impacting its performance. But its poor results are consistent for the 100 and 1000 size as well. It would be interesting to test it with a size of 16. Anyway all of the source for the tests is available, you are welcome to look at, or run the tests yourself if you think HashMap should be faster.

    ReplyDelete
  6. I really like this idea of getting some sense of the performance of the JVM internals, even if you really shouldn't do anything with this knowledge in day to day programming ;) However, here it sounds like you're really not comparing apples to apples with the different-sized HashMap and HashTable. I was also a bit surprised by the volatile field result, but a quick glance at the test explains a lot: it's the only test that only does 1 series of fields operations instead of the 100 in a tight loop of the others. Big grains of salt are needed here...

    ReplyDelete
  7. Hi James, really cool that you've done this analysis!

    I'll echo the request to see that test under the 1.6.0_23 JVM. One of the hugely underrated parts of Java 6 are the non-trivial enhancements to the JVM with their point releases.

    Let me know if you want to run the tests against the latest Open JDK (1.7) build as well (I might be able to help). Will be interesting to if there are any differences.

    ReplyDelete
  8. >I used 1.6.0_7 for no particular reason, it was just what a had installed.

    I'm sorry mate, but this doesn't sound really professional, does it?

    ReplyDelete
  9. Before I look at the benchmarks please tell me that all tests are run completely isolated from each other (i.e. separate process execution) so that the ordering of the individuals tests within a group has no bearing (positive/negative) on the performance of subsequent tests (hint: gc, heap sizing, native compilation,....).

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. Also, is sufficient time spent warming up the JVM before measuring? If not, you may get code generation overhead included in the results, and initial measurements will not reflect optimally generated code.

    ReplyDelete
  12. I find it a little strange that you wouldn't try an iterator with the LinkedList in your Iteration Performance Comparison. It would be one of the obvious use cases for it (and one of the main reasons for me to choose it)

    ReplyDelete
  13. I did a very simple test on maps and my results are quite different. As I expected, IdentityHashMap is the fastest, followed closely by HashMap. Hashtable, ConcurrentHashMap were twice as slow. I expect ConcurrentHashMap to be faster than Hashtable, if writes are less frequent.

    Here is the code:

    int size = 100;
    Map[] maps = new Map[] {
    new HashMap(size),
    new ConcurrentHashMap(size),
    new IdentityHashMap(size),
    new Hashtable(size) };

    for (Map m : maps) {
    int loops = 100000;
    long now = System.currentTimeMillis();
    while (loops-- > 0) {
    for (int i = 0; i < size; i++) {
    m.put(i, i);
    }
    for (int i = 0; i < size; i++) {
    m.get(i);
    }
    for (int i = 0; i < size; i++) {
    m.remove(i);
    }
    }
    long time = System.currentTimeMillis() - now;
    System.out.println(m.getClass().getSimpleName() + " " + time);
    }

    ReplyDelete
  14. I think these tests are flawed because they don't attempt to deoptimize megamorphic method calls. Notice that in both tests, the test that ran first is the fastest. As the jvm encounters me implementations of Map and List, the inlined calls to get, put, etc have to be un-inlined.

    ReplyDelete
  15. Wrt IdentityMap being slower -- that would not be surprising to me. When I last profiled it (granted it has been a while, so things may have changed), cost came from System.identityHashCode() which is a native method, and bit costly as such. It also uses linear probing which is not necessarily faster than buckets, although this really depends.

    I would agree with other suggestions -- micro-benchmarks must use proper warm-up and runtimes; as well as JVM isolation. Japex, for example, would provide these, as do most other suitable benchmarks.

    ReplyDelete
  16. How does the reflection performance change if you call setAccessible(true) on the method first?

    ReplyDelete
  17. One point: "Hashtable, and does not suffer from its limitation of using synchronized methods, which are suppose to be slow."

    That's not correct; Hashtable isn't synchronized! It just doesn't support null values or keys.

    ReplyDelete
  18. @blog
    Well the JavaDoc seems to think it is. On the Hashtable one it says:
    "Unlike the new collection implementations, Hashtable is synchronized."

    And on the HashMap one it says:
    "The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls"

    ReplyDelete
  19. @sp8472 good catch on the volatile test, not sure how I missed that. I will fix the test and update the volatile results.

    @williamlouth, @Marcus - the tests are run 5 times in the same JVM in intermixed order, the min/max results are rejected and middle 3 averaged, so if the JVM was hot/cold for the first run, the result would be rejected. If the JVM got faster as it warmed up, this would affects all tests the same. I have changed the order of the tests and re-run the tests, the results are the same, and very consistent.

    @Nikolay Tsankov - The difference between these tests and yours is that the creation of the Map is included in the test time, where are you are not measuring the creation. If your results are correct, then it would appear the creation cost is why HashMap is performing so poorly. I will look into confirming this. All these are run 5 times in mixed order and averaged (and standard deviation computed), where as you only run your test once, in a defined order, so may be seeing bad results.

    ReplyDelete
  20. I corrected the results for volatile

    Thanks to @sp8472 for finding the bug in the test.

    >>
    CORRECTION
    Originally volatile was showing an improvement in performance because of a bug in the volatile test. After correcting the bug the test now shows a huge overhead in performance. The usage of volatile on the int field that is being incremented in the test method is causing a 15x performance overhead. This is surprising, and much larger than the synchronized overhead, but still smaller than the reflection overhead. This is especially odd as the field is an int which one would think is atomic anyway. I imaging the affect the volatile has on the JVM memory usage is somehow causing the overhead. In spite of the overhead, I would still use volatile when required, in concurrent pieces of code where the same variable is concurrently modified. I have done multi-threaded tests comparing its concurrency with synchronizing the method, and using volatile is better than using synchronization for concurrency.

    ReplyDelete
  21. Good job mate for putting this data. By the way have you done this test using -server class of JVM

    Javin
    10 tips on debugging Java Program in eclipse

    ReplyDelete
  22. Useless synthetic microbenchmark. Data sets of 10, 100 or 1000 objects aren't relevant for big data, and queries of 10000000s per sec aren't relevant for small data. Besides, what was said already - change the testing conditions, you'll get different results. Only thing that IS sure is that sync'ed classes are slower than not sync'ed ones. Period.

    ReplyDelete