Tuesday, August 13, 2013

Optimizing Java Serialization - Java vs XML vs JSON vs Kryo vs POF

Perhaps I'm naive, but I always thought Java serialization must surely be the fastest and most efficient way to serialize Java objects into binary form. After all, Java is on it's 7th major release, so this is not new technology, and since every JDK seems to be faster than the last, I incorrectly assumed serialization must be very fast and efficient by now. I thought, since Java serialization is binary, and language dependent, it must be much faster and more efficient than XML or JSON. Unfortunately, I was wrong, if you concerned about performance, I would recommend avoiding Java serialization.

Now, don't get me wrong, I'm not trying to dis Java. Java serialization has many requirements, the main one being able to serialize anything (or at least anything that implements Serializable), into any other JVM (even a different JVM version/implementation), even running a different version of the classes being serialized (as long as you set a serialVersionUID). The main thing is it just works, and that is really great. Performance is not the main requirement, and the format is standard and must be backward compatible, so optimization is very difficult. Also, for many types of use cases, Java serialization performs very well.

I got started on this journey into the bowels of serialization while working on a three tier concurrency benchmark. I noticed a lot of the CPU time being spent inside Java serialization, so I decided to investigate. I started by serializing a simple Order object that had a couple of fields. I serialized the object and output the bytes. Although the Order object only had a few bytes of data, I was not that naive to think it would serialize to only a few bytes, I knew enough about serialization that it would at least need to write out the full class name, so it knew what it had serialized, so it could read it back. So I expected, maybe 50 bytes or so. The result was over 600 bytes, that's when I realized Java serialization was not as simple as I had imagined.

Java serialization bytes for Order object

----sr--model.Order----h#-----J--idL--customert--Lmodel/Customer;L--descriptiont--Ljava/lang/String;L--orderLinest--Ljava/util/List;L--totalCostt--Ljava/math/BigDecimal;xp--------ppsr--java.util.ArrayListx-----a----I--sizexp----w-----sr--model.OrderLine--&-1-S----I--lineNumberL--costq-~--L--descriptionq-~--L--ordert--Lmodel/Order;xp----sr--java.math.BigDecimalT--W--(O---I--scaleL--intValt--Ljava/math/BigInteger;xr--java.lang.Number-----------xp----sr--java.math.BigInteger-----;-----I--bitCountI--bitLengthI--firstNonzeroByteNumI--lowestSetBitI--signum[--magnitudet--[Bxq-~----------------------ur--[B------T----xp----xxpq-~--xq-~--
(note "-" means an unprintable character)

As you may have noticed, Java serialization writes out not only the full class name of the object being serialized, but also the entire class definition of the class being serialized, and all of the referenced classes. The class definition can be quite large, and seems to be the main performance and efficiency issue, especially when writing out a single object. If you are writing out a large number of objects of the same class, then the class definition overhead is not normally a big issue. One other thing that I noticed, is that if your object has a reference to a class (such as a meta-data object), then Java serialization will write the entire class definition, not just the class name, so using Java serialization to write out meta-data is very expensive.

Externalizable

It is possible to optimize Java serialization through implementing the Externalizable interface. Implementing this interface avoids writing out the entire class definition, just the class name is written. It requires that you implement the readExternal and writeExternal methods, so requires some work and maintenance on your part, but is faster and more efficient than just implementing Serializable.

One interesting note on the results for Externalizable is that it was much more efficient for a small number of objects, but actually output slightly more bytes than Serializable for a large number of objects. I assume the Externalizable format is slightly less efficient for repeated objects.

Externalizable class

public class Order implements Externalizable {
    private long id;
    private String description;
    private BigDecimal totalCost = BigDecimal.valueOf(0);
    private List orderLines = new ArrayList();
    private Customer customer;

    public Order() {
    }

    public void readExternal(ObjectInput stream) throws IOException, ClassNotFoundException {
        this.id = stream.readLong();
        this.description = (String)stream.readObject();
        this.totalCost = (BigDecimal)stream.readObject();
        this.customer = (Customer)stream.readObject();
        this.orderLines = (List)stream.readObject();
    }

    public void writeExternal(ObjectOutput stream) throws IOException {
        stream.writeLong(this.id);
        stream.writeObject(this.description);
        stream.writeObject(this.totalCost);
        stream.writeObject(this.customer);
        stream.writeObject(this.orderLines);
    }
}

Externalizable serialization bytes for Order object

----sr--model.Order---*3--^---xpw---------psr--java.math.BigDecimalT--W--(O---I--scaleL--intValt--Ljava/math/BigInteger;xr--java.lang.Number-----------xp----sr--java.math.BigInteger-----;-----I--bitCountI--bitLengthI--firstNonzeroByteNumI--lowestSetBitI--signum[--magnitudet--[Bxq-~----------------------ur--[B------T----xp----xxpsr--java.util.ArrayListx-----a----I--sizexp----w-----sr--model.OrderLine-!!|---S---xpw-----pq-~--q-~--xxx

Other Serialization Options

I started to investigate what other serialization options there were in Java. I started with EclipseLink MOXy, which supports serializing objects to XML or JSON through the JAXB API. I was not expecting XML serialization to outperform Java serialization, so was quite surprised when it did for certain use cases. I also found a product Kryo, which is an open source project for optimized serialization. I also investigated the Oracle Coherence POF serialization format. There are pros and cons with each product, but my main focus was on how their performance compared, and how efficient they were.

EclipseLink MOXy - XML and JSON

The main advantage of using EclipseLink MOXy to serialize to XML or JSON is that both are standard, portable formats. You can access the data from any client using any language, so are not restricted to Java, as with Java serialization. You can also integrate your data with web services and REST services. Both formats are also text based, so human readable. No coding or special interfaces are required, only meta-data. The performance is quite acceptable, and outperforms Java serialization for small data-sets.

The draw backs are that the text formats are less efficient than optimized binary formats, and JAXB requires meta-data. So you need to annotate your classes with JAXB annotations, or provide an XML configuration file. Also, circular references are not handled by default, you need to use an @XmlIDREF to handle cycles.

JAXB annotated classes

@XmlRootElement
public class Order {
    @XmlID
    @XmlAttribute
    private long id;
    @XmlAttribute
    private String description;
    @XmlAttribute
    private BigDecimal totalCost = BigDecimal.valueOf(0);
    private List orderLines = new ArrayList();
    private Customer customer;
}

public class OrderLine {
    @XmlIDREF
    private Order order;
    @XmlAttribute
    private int lineNumber;
    @XmlAttribute
    private String description;
    @XmlAttribute
    private BigDecimal cost = BigDecimal.valueOf(0);
}

EclipseLink MOXy serialization XML for Order object

<order id="0" totalCost="0"><orderLines lineNumber="1" cost="0"><order>0</order></orderLines></order>

EclipseLink MOXy serialization JSON for Order object

{"order":{"id":0,"totalCost":0,"orderLines":[{"lineNumber":1,"cost":0,"order":0}]}}

Kryo

Kryo is a fast, efficient serialization framework for Java. Kryo is an open source project on Google code that is provided under the New BSD license. It is a small project, with only 3 members, it first shipped in 2009 and last shipped the 2.21 release in Feb 2013, so is still actively being developed.

Kryo works similar to Java serialization, and respects transient fields, but does not require a class be Serializable. I found Kryo to have some limitations, such as requiring classes to have a default constructor, and encountered some issues in serializing java.sql.Time, java.sql.Date and java.sql.Timestamp classes.

Kryo serialization bytes for Order object

------java-util-ArrayLis-----model-OrderLin----java-math-BigDecima---------model-Orde-----

Oracle Coherence POF

The Oracle Coherence product provides its own optimized binary format called POF (portable object format). Oracle Coherence is an in-memory data grid solution (distributed cache). Coherence is a commercial product, and requires a license. EclipseLink supports an integration with Oracle Coherence through the Oracle TopLink Grid product that uses Coherence as the EclipseLink shared cache.

POF provides a serialization framework, and can be used independently of Coherence (if you already have Coherence licensed). POF requires that your class implement a PortableObject interface and read/write methods. You can also implement a separate Serializer class, or use annotations in the latest Coherence release. POF requires that each class be assigned an constant id ahead of time, so you need some way determine this id. The POF format is a binary format, very compact, efficient, and fast, but does require some work on your part.

The total bytes for POF was 32 bytes for a single Order/OrderLine object, and 1593 bytes for 100 OrderLines. I'm not going to give the results, as POF is part of a commercially licensed product, but is was very fast.

POF PortableObject

public class Order implements PortableObject {
    private long id;
    private String description;
    private BigDecimal totalCost = BigDecimal.valueOf(0);
    private List orderLines = new ArrayList();
    private Customer customer;

    public Order() {
    }
    
    public void readExternal(PofReader in) throws IOException {
        this.id = in.readLong(0);
        this.description = in.readString(1);
        this.totalCost = in.readBigDecimal(2);
        this.customer = (Customer)in.readObject(3);
        this.orderLines = (List)in.readCollection(4, new ArrayList());
    }

    public void writeExternal(PofWriter out) throws IOException {
        out.writeLong(0, this.id);
        out.writeString(1, this.description);
        out.writeBigDecimal(2, this.totalCost);
        out.writeObject(3, this.customer);
        out.writeCollection(4, this.orderLines);
    }
}

POF serialization bytes for Order object

-----B--G---d-U------A--G-------

Results

So how does each perform? I made a simple benchmark to compare the different serialization mechanisms. I compared the serialization of two different use cases. The first is a single Order object with a single OrderLine object. The second is a single Order object with 100 OrderLine objects. I compared the average serialization operations per second, and measure the size in bytes of the serialized data. Different object models, use cases, and environments will give different results, but this gives you a general idea of the performance differences in different serializers.

The results show that Java serialization is slow for a small number of objects, but good for a large number of objects. Conversely, XML and JSON can outperform Java serialization for a small number of objects, but Java serialization is faster for a large number of objects. Kryo and other optimized binary serializers outperform Java serialization with both types of data.

You may wonder why it is relevant that something that takes less than a millisecond is of any relevance to performance, and you may be right. In general you would only have a real performance problem if you wrote out a very large number of objects, and then Java serialization performs quite well, so is the fact that it performs very poorly for a small number of object relevant? For a single operation, this is probably true, but if you execute many small serialization operations, then the cost is relevant. A typical server servicing many clients will typically send out many small requests, so while the cost of serialization is not great enough to make any of these single requests take a long time, it will dramatically effects the scalability of the server.

Order with 1 OrderLine

SerializerSize (bytes)Serialize (operations/second)Deserialize (operations/second)% Difference (from Java serialize)% Difference (deserialize)
Java Serializable636128,63419,1800%0%
Java Externalizable435160,54926,67824%39%
EclipseLink MOXy XML101348,05647,334170%146%
Kryo90359,368346,984179%1709%

Order with 100 OrderLines

SerializerSize (bytes)Serialize (operations/second)Deserialize (operations/second)% Difference (from Java serialize)% Difference (deserialize)
Java Serializable2,71516,47010,2150%0%
Java Externalizable2,81116,20611,483-1%12%
EclipseLink MOXy XML6,6287,3042,731-55%-73%
Kryo121622,86231,49938%208%

EclipseLink JPA

In EclipseLink 2.6 development builds, and to some degree 2.5, we have added the ability to choose your serializer anywhere that EclipseLink does serialization.

One such place is in serialized @Lob mappings. You can now use the @Convert annotation to specify a serializer such as @Convert(XML), @Convert(JSON), @Convert(Kryo). In addition to optimizing performance, this provides an easy mechanism to write XML and JSON data to your database.

Also for EclipseLink cache coordination you can choose your serializer using the "eclipselink.cache.coordination.serializer" property.

The source code for the benchmarks used in this post can be found here, or downloaded here.

16 comments :

  1. Serializing objects directly using ByteBuffers or sun.misc.Unsafe can be orders of magnitude faster than Java serialization: http://mechanical-sympathy.blogspot.jp/2012/07/native-cc-like-performance-for-java.html

    ReplyDelete
  2. I can't resist... Here are some numbers from the article above:

    2.4GHz Sandy Bridge - Java 1.7.0_04
    ===================================
    0 Serialisation write=1,940ns read=9,006ns total=10,946ns
    1 Serialisation write=1,674ns read=8,567ns total=10,241ns
    2 Serialisation write=1,666ns read=8,680ns total=10,346ns
    3 Serialisation write=1,666ns read=8,623ns total=10,289ns
    4 Serialisation write=1,715ns read=8,586ns total=10,301ns
    0 ByteBuffer write=199ns read=198ns total=397ns
    1 ByteBuffer write=176ns read=178ns total=354ns
    2 ByteBuffer write=174ns read=174ns total=348ns
    3 ByteBuffer write=172ns read=183ns total=355ns
    4 ByteBuffer write=174ns read=180ns total=354ns
    0 UnsafeMemory write=38ns read=75ns total=113ns
    1 UnsafeMemory write=26ns read=52ns total=78ns
    2 UnsafeMemory write=26ns read=51ns total=77ns
    3 UnsafeMemory write=25ns read=51ns total=76ns
    4 UnsafeMemory write=27ns read=50ns total=77ns

    ReplyDelete
  3. @remko Thanks for the comment, interesting blog post, in terms of rolling your own serialization, and managing your own memory. The object being tested is quite simplistic, rolling your own becomes more complex as your object model's complexity increases. Also the test was only writing a single object, where Java serialization is going to have a big overhead, Java serialization would do much better for a larger set of the objects.

    ReplyDelete
  4. Wonder whether you can include GSON and protobuf in the benchmark.

    ReplyDelete
  5. @Remko, @James:
    FYI, the trunk version of Kryo (2.22-SNAPSHOT) provides support for serialization using ByteBuffers and sun.misc.Unsafe. Just use UnsafeInput/UnsafeOutput or UnsafeMemoryInput/UnsafeMemoryOutput instead of the usual Input/Output classes provided by Kryo. These specialized classes provide a significant performance boost in many situations.

    BTW, similar approach was recently borrowed from Kryo by Hazelcast 3.0 and Avro. So, expect good results from them as well.

    ReplyDelete
  6. If you are interested in serialization performance, you should first check out JVM-serializers project (https://github.com/eishay/jvm-serializers). It has already tested most of things you did.

    One very fast choice you did not include is Jackson (https://github.com/FasterXML/jackson).
    Jackson is the fastest way to serialize POJOs as JSON (and possibly fastest for XML as well). It also supports binary JSON representation called Smile.
    The main reason for Jackson inclusion is actually the fact that most Java service frameworks nowadays include it in one way or another; I was actually surprised to see that it was not included here.

    ReplyDelete

  7. Hello Mr Sutherland,

    Nice blog! I would like to discuss a partnership opportunity between
    our sites. Is there an email address I can contact you in private?

    Thanks,

    Eleftheria Kiourtzoglou

    Head of Editorial Team

    Java Code Geeks

    email: [dot][at]javacodegeeks[dot]com

    ReplyDelete
  8. For my own interest, I did the same with Java Chronicle 2.0. Java Chronicle includes writing and reading from a file which makes it more interesting for me.

    order lines: 1
    orderCache-Kryo-serialize : 743181
    orderCache-Kryo-deserialize : 707776
    Size: 90

    orderCache-Java-serialize : 217936
    orderCache-Java-deserialize : 31579
    Size: 636

    order-Chronicle-serialize : 6892399
    order-Chronicle-deserialize : 5943322
    Size: 12

    ---

    order lines: 100
    order-Chronicle-serialize : 201030
    order-Chronicle-deserialize : 146758
    Size: 507

    ReplyDelete
    Replies
    1. Cool! Could you provide the source code for your experiment so that others can test it as well? Thanks!

      Delete
  9. great article! helped me a lot! thank u sooooooooooo much....

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. XStream is another possibility. Whether JSON, XML or what have you it is fast and malleable. Normally it will spell out the full class name as part of the String it produces but you can use aliases to make it more like a language neutral marshaling and make the objects smaller.

    ReplyDelete
  13. Serialization is the process of writing complete state of java object into output stream, that stream can be file or byte array or stream associated with TCP/IP socket. Thanks for sharing this post on optimizing java serialization.

    SEO Company in Bangalore | Web Developers in Bangalore

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. Externalizor is a new open source library which automatically handles JAVA serialization thru the Externalizable interface. It also includes compression mechanisms for primitive array (based on snappy).

    ReplyDelete