Monday, April 23, 2012

Objects vs Data, and Filtering a JOIN FETCH

JPA deals with objects, and SQL deals with rows. Some developers love JPA because it allows them to use objects in an object-oriented fashion. However, other developers have trouble understanding JPA precisely because it is object-oriented.

Even though object-oriented programming has been popular for several decades, there are still a lot of procedural languages out their. Developers used to procedural programming have a different mindset than object-oriented developers. This is also true of developers with a lot of experience with SQL. JPA and JPQL can be difficult to understand for developers used to SQL and JDBC.

This can lead to some odd usages and misunderstandings of JPA. In this blog post, I would like to highlight a couple of these misunderstandings, and provide a solution to my favorite, which I call Filtering a JOIN FETCH.

Dobject Model

One JPA usage that I find aggravating is what I call the Dobject model. This is a data model that has been made into an object model. Sometimes this comes across as a class that has the same name as the database table, with all of the same field names, and no relationships. Sometimes there are relationships, but they have the same name as the foreign key columns.

@Entity
public class EMP {
  @Id
  long EMP_ID;
  String F_NAME;
  String L_NAME;
  long MGR_ID;
  long ADDR_ID;
}

The above is an unusual class, and not very object-oriented. It is probably not as useful as it could if it had relationships instead of foreign keys.

@Entity
public class employees {
  @Id
  long empId;
  String fName;
  String lName;
  @ManyToOne
  Employee mgrId;
  @OneToOne
  Address addrId;
}

This one is very confused. First of all, it seems to be named after its table, where the name employees might make sense, but an object is a single entity, so should not be pluralized. Also, classes in Java should start with an upper case letter, as classes are proper names of the real world entity that they represent.

This class at least has relationships, which I suppose is an improvement, but they are named like they are foreign keys, not objects. This normally leads the user to try to query them as values instead of as objects.

Select e from employees e where e.mgrId = 4

Which does not work, because mgrId is an Employee object, not an Id value. The relationship should be named after what it represents, i.e. manager not the foreign key.

A more object-oriented way to define the class would be:

@Entity
public class Employee {
  @Id
  long id;
  String firstName;
  String lastName;
  @ManyToOne(fetch=FetchType.LAZY)
  Employee manager;
  @OneToMany(mappedBy="manager")
  List<Employee> managedEmployees;
  @OneToOne
  Address address;
}

How not to write a DAO

In JPA you do not normally execute queries to insert, update or delete objects. To update and object you just find or query it, change it through its set methods, and commit the transaction. JPA automatically keeps track of what changed and updates what is required to be updated, in the order that it is required to be updated.

To insert an object you call persit() on the EntityManager. To delete an object you call remove() on the EntityManager. This is different than SQL or JDBC that requires you to execute a query to perform any modification.

JPA does allow UPDATE and DELETE queries through JPQL. These are for batch updates and deletes, not for the deletion or updating or single objects. This can lead to the very confused Data Access Object below:

public class EmployeeDOA {
  public void insert(Employee employee) {
    em.persist(employee);
  }
  public void update(Employee employee) {
    Query query = em.createQuery("Update Employee e set e.firstName = :firstName, e.lastName = :lastName where e.id = :id");
    query.setParameter("id", employee.getId());
    query.setParameter("firstName", employee.getFirstName());
    query.setParameter("lastName", employee.getLastName());
    query.executeUpdate();
  }
  public void delete(Employee employee) {
    Query query = em.createQuery("Delete from Employee e where e.id = :id");
    query.setParameter("id", employee.getId());
    query.executeUpdate();
  }
}

This is wrong. You do not execute queries to update or delete objects. This will leave the objects in your persistence context in an invalid state, as they are not aware of the query updates. In is also not using JPA correctly, or as it was intended, and not benefiting from its full functionality. A better Data Access Object would be:
public class EmployeeDOA {
  public void persist(Employee employee) {
    em.persist(employee);
  }
  public Object merge(Employee employee) {
    return em.merge(employee);
  }
  public void remove(Employee employee) {
    em.remove(employee);
  }
}

Note that there is no update(). JPA does not have or require and update(), merge() can be used for detached objects, but is not the equivalent of update(), you do not need to call update in JPA, this is one of its benefits.

JPQL vs SQL

JPQL is not SQL. It looks a lot like SQL, has similar syntax and uses the same standard naming for operators and functions, but it is not SQL. This can be very confusing for someone experienced with SQL. When they try to use their SQL in place of JPQL it does not work.

Select * from Employee e join Address a on e.addressId = a.id where a.city = 'Ottawa'

This is SQL, not JPQL.
The equivalent JPQL would be:
Select e from Employee e where e.address.city = 'Ottawa'

Of coarse if you prefer SQL, JPA fully allows you to use SQL, you just need to call createNativeQuery instead of createQuery. However, most users prefer to use JPQL. I suppose this is because JPQL lets them deal with objects, and even if they don't quite understand objects, they do understand there is some benefit there.

JPQL also defines the JOIN syntax, but it does not have a ON clause, and JOIN is based on relationships, not foreign keys.

Select e from Employee e join e.address a where a.city = 'Ottawa'

The JOIN syntax in JPQL allows you do query collection relationships, and use OUTER joins and FETCH. A join FETCH allows you read an object an its relationship in a single query (as appose to a possible dreaded N+1 queries).

Select e from Employee e left join fetch e.address where e.address.city = 'Ottawa'

Notice that I did not use an alias on this JOIN FETCH. This is because JPQL does not allow this, which I will get into later. There is also no ON clause in JPQL because the relationship is always joined by the foreign keys defined in the relationship's join columns in its mapping.

Sometimes it is desirable to place additional conditions in a JOIN ON clause. This is normally in the case of OUTER joins, where placing the condition in the WHERE clause would result in empty joins being filtered. A ON clause is something that is part of the JPA 2.1 draft, so it is coming. EclipseLink already supports the ON clause, as well as aliasing JOIN FETCH in its 2.4 development milestones, see:

www.eclipse.org/eclipselink/downloads/milestones

Filtering a JOIN FETCH

A common misunderstanding I see users make occurs when querying a OneToMany relationship.
Select d from Department d join d.employees e where e.address.city = 'Ottawa'

This query results in all department objects that have any employee living in Ottawa. The confusion comes when they access the department's employees. Each department contains all of its employees, however they were expecting just the employees that live in Ottawa.

This is because JPA deals with objects, and a Department object represents a specific department, and has a specific identity. If I issue two queries for a particular department, I get back the same identical (==) instance (provided both queries use the same EntityManager). A specific department always has the same employees, it represent the real world department, and does not change, just because you queried it differently. This is important for caching, but also within the same persistence context, if you query the same department, two different ways, you should also get back the same exact department.

If you really want the department, and only the employees of the department that live in Ottawa, you can use the following query:
Select d, e from Department d join d.employees e where e.address.city = 'Ottawa'

This will give you an List of Object[] that contain the Department and the Employee. For each employee you will get back n Object[] (rows), where n is the number of departments in the employee. The same employee will be duplicated n times, each with its different department. If you access the departments any of the employees they will contain all of the employee's departments, not just the ones in Ottawa.

But, if you really, really want the department to only have the employees that live it Ottawa, you can do this. I'm not sure I would recommend it, but it is possible, at least in EclipseLink 2.4 it will be. EclipseLink allows you to use an alias on a JOIN FETCH. This support was intended for OneToOne and ManyToOne relationships, to avoid having to join it twice just to get an alias, as well as to allow using it in an ORDER BY or in other ways that would not filter the results and alter how the objects are built. But, there is nothing stopping you from using it with a OneToMany to filter the contents of the fetched OneToMany results.

If you are going to do this, you should be careful, and at least ensure you set the "javax.persistence.cache.storeMode" and "javax.persistence.cache.retrieveMode" to BYPASS, to avoid corrupting the shared cache.

Query query = em.createQuery("Select d from Department d join fetch d.employees e where e.address.city = 'Ottawa'");
query.setHint("javax.persistence.cache.storeMode", "BYPASS");
query.setHint("javax.persistence.cache.retrieveMode", "BYPASS");
List departmentsWithFilteredEmployees = query.getResultList();

EclipseLink 2.4 is currently under development, but its milestone builds already contain this functionality. The EclipseLink 2.4 milestone builds can be download from:
www.eclipse.org/eclipselink/downloads/milestones

For more information of the many JPQL extensions and enhancements in EclipseLink 2.4 see:
wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Querying/JPQL

Monday, April 2, 2012

EclipseLink JPA supports MongoDB

EclipseLink 2.4 will support JPA access to NoSQL databases. This support is already part of the EclipseLink development trunk and can be tried out using the milestone or nightly builds.  Initial support is provided for MongoDB and Oracle NoSQL. A plug-able platform and adapter layer allows for other databases to be supported.

NoSQL is a classification of database systems that do not conform to the relational database or SQL standard. They have various roots, from distributed internet databases, to object databases, XML databases and even legacy databases. They have become recently popular because of their use in large scale distributed databases in Google, Amazon, and Facebook.

There are various NoSQL databases including:
  • Mongo DB
  • Oracle NoSQL
  • Cassandra
  • Google BigTable
  • Couch DB

EclipseLink's NoSQL support allows the JPA API and JPA annotations/xml to be used with NoSQL data. EclipseLink also supports several NoSQL specific annotations/xml including @NoSQL that defines a class to map NoSQL data.

EclipseLink's NoSQL support is based on previous EIS support offered since EclipseLink 1.0. EclipseLink's EIS support allowed persisting objects to legacy and non-relational databases. EclipseLink's EIS and NoSQL support uses the Java Connector Architecture (JCA) to access the data-source similar to how EclipseLink's relational support uses JDBC. EclipseLink's NoSQL support is extendable to other NoSQL databases, through the creation of an EclipseLink EISPlatform class and a JCA adapter.

Let's walk through an example of using EclipseLink's NoSQL support to persist an ordering system's object model to a MongoDB database.

The source for the example can be found here, or from the EclipseLink SVN repository.

Ordering object model

The ordering system consists of four classes, Order, OrderLine, Address and Customer. The Order has a billing and shipping address, many order lines, and a customer.

public class Order implements Serializable {
private String id;
private String description;
private double totalCost = 0;
private Address billingAddress;
private Address shippingAddress;
private List orderLines = new ArrayList();
private Customer customer;
...
}
public class OrderLine implements Serializable {
private int lineNumber;
private String description;
private double cost = 0;
...
}
public class Address implements Serializable {
private String street;
private String city;
private String province;
private String country;
private String postalCode;
....
}
public class Customer implements Serializable {
private String id;
private String name;
...
}

Step 1 : Decide how to store the data

There is no standard on how NoSQL databases store their data. Some NoSQL databases only support key/value pairs, others support structured hierarchical data such as JSON or XML.

MongoDB stores data as BSON (binary JSON) documents. The first decision that must be made is how to store the objects. Normally each independent object would compose a single document, so a single document could contain Order, OrderLine and Address. Since customers can be shared amongst multiple orders, Customer would be its own document.

Step 2 : Map the data

The next step is to map the objects. Each root object in the document will be mapped as an @Entity in JPA. The objects that are stored by being embedded within their parent's document are mapped as @Embeddable. This is similar to how JPA maps relational data, but in NoSQL embedded data is much more common because of the hierarchical nature of the data format. In summary, Order and Customer are mapped as @Entity, OrderLine and Address are mapped as @Embeddable.

The @NoSQL annotation is used to map NoSQL data. This tags the classes as mapping to NoSQL data instead of traditional relational data. It is required in each persistence class, both entities and embeddables. The @NoSQL annotation allows the dataType and the dataFormat to be set.

The dataType is the equivalent of the table in relational data, its meaning can differ depending on the NoSQL data-source being used. With MongoDB the dataType refers to the collection used to store the data. The dataType is defaulted to the entity name (as upper case), which is the simple class name.

The dataFormat depends on the type of data being stored. Three formats are supported by EclipseLink, XML, Mapped, and Indexed. XML is the default, but since MongoDB uses BSON, which is similar to a Map in structure, Mapped is used. In summary, each class requires the @NoSql(dataFormat=DataFormatType.MAPPED) annotation.

@Entity
@NoSql(dataFormat=DataFormatType.MAPPED)
public class Order

@Embeddable
@NoSql(dataFormat=DataFormatType.MAPPED)
public class OrderLine

Step 3 : Define the Id

JPA requires that each Entity define an Id. The Id can either be a natural id (application assign id) or a generated id (id is assign by EclipseLink). MongoDB also requires an _id field in every document. If no _id field is present, then Mongo will auto generate and assign the _id field using an OID (object identifier) which is similar to a UUID (universally unique identifier).

You are free to use any field or set of fields as your Id in EclipseLink with NoSQL, the same as a relational Entity. To use an application assigned id as the Mongo id, simply name its field as "_id". This can be done through the @Field annotation, which is similar to the @Column annotation (which will also work), but without all of the relational details, it has just a name. So, to define the field Mongo will use for the id include @Field(name="_id") in your mapping.

To use the generated Mongo OID as your JPA Id, simply include @Id, @GeneratedValue, and @Field(name="_id") in your object's id field mapping. The @GeneratedValue tells EclipseLink to use the Mongo OID to generate this id value. @SequenceGenerator and @TableGenerator are not supported in MongoDB, so these cannot be used. Also the generation types of IDENTITY, TABLE and SEQUENCE are not supported. You can use the EclipseLink @UUIDGenerator if you wish to use a UUID instead of the Mongo OID. You can also use your own custom generator. The id value for a Mongo OID or a UUID is not a numeric value, it can only be mapped as String or byte[].

@Id
@GeneratedValue
@Field(name="_id")
private String id;

Step 4 : Define the mappings

Each attribute in your object has too be mapped. If no annotation/xml is defined for the attribute, then it mapping will be defaulted. Defaulting rules for NoSQL data, follow the JPA defaulting rules, so most simple mappings do not require any configuration if defaults are used. The field names used in the Mongo BSON document will mirror the object attribute names (as uppercase). To provide a different BSON field name, the @Field annotation is used.

Any embedded value stored in the document is persisted using the @Embedded JPA annotation. An embedded collection will use the JPA @ElementCollection annotation. The @CollectionTable of the @ElementCollection is not used or supported in NoSQL, as the data is stored within the document, no separate table is required. The @AttributeOverride is also not required nor supported with NoSQL, as the embedded objects are nested in the document, and do not require unique field names. The @Embedded annoation/xml is normally not required, as it is defaulted, the @ElementCollection is required, as defaulting does not currently work for @ElementCollection in EclipseLink.

The relationship annotations/xml @OneToOne, @ManyToOne, @OneToMany, and @ManyToMany are only to be used with external relationships in NoSQL. Relationships within the document use the embedded annotations/xml. External relationships are supported to other documents. To define an external relationship a foreign key is used. The id of the target object is stored in the source object's document. In the case of a collection, a collection of ids is stored. To define the name of the foreign key field in the BSON document the @JoinField annotation/xml is used.

The mappedBy option on relationships is not supported for NoSQL data, for bi-directional relationships, the foreign keys would need to be stored on both sides. It is also possible to define a relationship mapping using a query, but this is not currently supported through annotations/xml, only through a DescriptorCustomizer.

@Basic
private String description;
@Basic
private double totalCost = 0;
@Embedded
private Address billingAddress;
@Embedded
private Address shippingAddress;
@ElementCollection
private List orderLines = new ArrayList();
@ManyToOne(fetch=FetchType.LAZY)
private Customer customer;

Step 5 : Optimistic locking

Optimistic locking is supported with MongoDB. It is not required, but if locking is desired, the @Version annotation can be used.

Note that MongoDB does not support transactions, so if a lock error occurs during a transaction, any objects that have been previously written will not be rolled back.

@Version
private long version;

Step 6 : Querying

MongoDB has is own JSON based query by example language. It does not support SQL (i.e. NoSQL), so querying has limitations.

EclipseLink supports both JPQL and the Criteria API on MongoDB. Not all aspects of JPQL are supported. Most basic operations are supported, but joins are not supported, nor sub-selects, group bys, or certain database functions. Querying to embedded values, and element collections are supported, as well as ordering, like, and selecting attribute values.

Not all NoSQL database support querying, so EclipseLink's NoSQL support only supports querying if the NoSQL platform supports it.

Query query = em.createQuery("Select o from Order o where o.totalCost > 1000");
List orders = query.getResultList();

Query query = em.createQuery("Select o from Order o where o.description like 'Pinball%'");
List orders = query.getResultList();

Query query = em.createQuery("Select o from Order o join o.orderLines l where l.description = :desc");
query.setParameter("desc", "shipping");
List orders = query.getResultList();

Native queries are also supported in EclipseLink NoSQL. For MongoDB the native query is in MongoDB's command language.

Query query = em.createNativeQuery("db.ORDER.findOne({\"_id\":\"" + oid + "\"})", Order.class);
Order order = (Order)query.getSingleResult();

Step 7 : Connecting

The connection to a Mongo database is done through the JPA persistence.xml properties. The "eclipselink.target-database" property must define the Mongo platform "org.eclipse.persistence.nosql.adapters.mongo.MongoPlatform". A connection spec must also be defined through "eclipselink.nosql.connection-spec" to be "org.eclipse.persistence.nosql.adapters.mongo.MongoConnectionSpec". Other properties can also be set such as the "eclipselink.nosql.property.mongo.db", "eclipselink.nosql.property.mongo.host" and "eclipselink.nosql.property.mongo.port". The host and port can accept a comma separated list of values to connect to a cluster of Mongo databases.

<persistence-unit name="mongo-example" transaction-type="RESOURCE_LOCAL">
<class>model.Order</class>
<class>model.OrderLine</class>
<class>model.Address</class>
<class>model.Customer</class>
<properties>
<property name="eclipselink.target-database" value="org.eclipse.persistence.nosql.adapters.mongo.MongoPlatform">
<property name="eclipselink.nosql.connection-spec" value="org.eclipse.persistence.nosql.adapters.mongo.MongoConnectionSpec">
<property name="eclipselink.nosql.property.mongo.port" value="27017">
<property name="eclipselink.nosql.property.mongo.host" value="localhost">
<property name="eclipselink.nosql.property.mongo.db" value="mydb">
<property name="eclipselink.logging.level" value="FINEST">
</property>
</property>

Summary

The full source code to this demo is available from SVN.

To run the example you will need a Mongo database, which can be downloaded from, http://www.mongodb.org/downloads.

EclipseLink also support NoSQL access to other data-sources including:
  • Oracle NoSQL
  • XML files
  • JMS
  • Oracle AQ