Tuesday, November 13, 2012

EclipseLink 15x faster than other JPA providers

I'm sorry to disappoint those of you looking to see some canned benchmark showing how EclipseLink dominates over all competitors in performance and scalability.

It is not that this would be difficult to do, or that I don't have such a benchmark that shows this, but that I give the JPA community credit for not attributing much weight to a benchmark produced by a single JPA provider that shows their JPA product excels against their competitors. If you really want such a benchmark, scroll to the bottom of this post.

There are many such provider produced benchmarks out there. There are websites, blog posts, and public forum posts from these vendors marketing their claims. If you believe these claims, and wish to migrate your application to some previously unheard of JPA provider, then I have an email that I would like to send to you from a Nigerian friend who needs help transferring money into your country.

The only respectable benchmark that I know of that heavily utilizes JPA is the SPECjEnterprise2010 benchmark from the Standard Performance Evaluation Corporation, a respected, independent standards body that produces many industry standard benchmarks. SpecJ is more than a JPA benchmark, as it measures the performance of the entire JavaEE stack, but JPA is a major component in it.

One could argue that SpecJ is too broad of a benchmark, and defining the performance of an entire JavaEE stack with a single number is not very meaningful. I have helped in producing Oracle's SpecJ results for several years, both with TopLink/EclipseLink JPA on WebLogic and previously with TopLink CMP on OC4J. I can honestly tell you that producing good numbers on the benchmark is not easy, and does require you to significantly optimize your code, and in particular optimize your concurrency and scalability. Having good JPA performance will not guarantee you good results on SpecJ, as there are other components involved, but having poor performance, concurrency or scalability in any component in your stack will prevent you from having good results. SpecJ emulates a real multi-user application, with a separate driver machine (or many) that simulate a large number of concurrent users hitting the system. It is very good at finding any performance, concurrency, or scalability bottlenecks in your stack.

One could also argue that it is very difficult for small JPA providers to publish a SpecJ result. Publishing a result requires having a full JavaEE stack and being integrated with it, and having access to real hardware, and investing the time and having the expertise to analyze and optimize the implementation on the real hardware. Of course, one could also argue if you don't have access to real hardware and real expertise, can you really expect to produce a performant and scalable product?

Ideally if you are looking for the most performant JPA provider for your application, you would benchmark it on your own application, and on your production hardware. Writing a good performance benchmark and tests can be a difficult task. It can be difficult to get consistent results, and easy to misinterpret things. In general, if a result doesn't make sense, there is probably something fishy going on. I have seen many such invalid performance comparisons from our users over the years (and from competitors as well). One that I remember was a user had a test showing TopLink was 100x slower than their JDBC code for a simple query. After looking at the test, I found their JDBC code executed a statement, but did not actually fetch any rows, or even close the statement. Once their JDBC code was fixed to be valid, TopLink was actually faster than their non-optimized JDBC code.

In EclipseLink and Oracle TopLink we take performance and scalability very seriously. In EclipseLink we run a large suite of performance tests every week, and every release to ensure our performance is better than our previous release, and better than any of our main competitors. Although our tests show we have the best performance today, this has not always been the case. We did not write the tests to showcase our performance, but to find problem areas where we were slower than our competitors. We then optimized those areas to ensure we were then faster than our competitors. I'm not claiming EclipseLink, or Oracle TopLink are faster than every other JPA implementation, in every use case. There will be some use cases where we are not faster, and some where our default configuration differs from another JPA provider that makes us slower until the configuration is fixed.

For example, EclipseLink does not automatically join fetch EAGER relationships, we view this as wrong, as EAGER does not mean JOIN FETCH, we have a separate @JoinFetch annotation to specify this. So there are some benchmarks, such as one from an object database that supports JPA that shows EclipseLink slower for ElementCollection mappings, this is solely because of the configuration, and we would be faster with the simple addition of a @JoinFetch.

EclipseLink also does not enable batch writing by default. You can configure this easily in the persistence.xml, but some other JPA providers use batch writing by default on some databases, so this could show a difference. EclipseLink also requires the usage of an agent to enable LAZY and other optimizations, some "benchmarks" fail to use this agent, which will seriously affect EclipseLink's performance.

The raw throughput of JPA is rarely ever the main factor in an application's performance. How optimized the database access is, what features the JPA provider supports, and how well the application makes use of JPA, do have a big impact on an applications performance. EclipseLink and Oracle TopLink have a large feature set of performance and scalability features. I'm not going to go over all of these today, but you can browse my blog for other posts on some of these.

Finally, if you really want a benchmark showing EclipseLink is 15x faster than any other JPA implementation, (even ones that claim this fact), you can find it here. When running this benchmark on my desktop, accessing a local Derby database, it does 1,016,884 queries per second. This is over 15x (88x in fact) faster than other JPA providers, even 145x faster than the self claimed "fastest JPA implementation on the face of the Earth".

Granted EclipseLink contains an integrated, feature rich, object cache, where as other JPA providers do not. So, I also ran the "benchmark" with caching disabled. Even then EclipseLink was 2.7x faster. Of course EclipseLink also supports statement caching, other JPA providers do not, so I also disabled this. Again EclipseLink was 2.2x faster than the self claimed "fastest JPA implementation on the face of the Earth".

Was it a fair, objective benchmark that emulates a real application environment? Probably not... but don't take my, or anyone else's word for it.

4 comments :

  1. Waiting the next "15x faster" ;-)

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. The benchmark is unreally stupid :)
    You've not disabled the EclipseLink's shared cache.
    It basically does NO SELECTS AT ALL! - Just returns the object cached since the moment of insertion.

    Re-test with <property name="eclipselink.cache.shared.default" value="false"/> and you'll get almost the same result as with Hibernate.

    ReplyDelete
  4. When the test without a cache is 15x more faster than with a cache, then the cache makes no sense!

    And the same proplems i've seen in many real time situations! In many times the internal caches of jpa providers are really horrible implemented and not really under control (open jpa from apache has the same problems with the cache control like hibernate)

    By the way, the most databases have their own cache and if i need a cache in java then i use eh-cache or something like this cause this is under my control.

    ReplyDelete