DB2 Version 10.1 for Linux, UNIX, and Windows

Default and optimized RDF stores

Two kinds of RDF stores are used with DB2® databases. One is referred to as default RDF store, while the other is referred to as an optimized RDF store.

Default RDF stores

This base schema is used when nothing is known about the RDF data being stored or no appropriate sample is available. Default RDF stores use a default number of columns in the Direct Primary and Reverse Primary tables. You use the default store when starting with a new RDF data set, about which nothing is known. In the default store hashing is used to determine the columns to which the predicates and objects go to in the Direct Primary and Reverse Primary tables.

Create a default RDF store when you have no existing sample data of the RDF data set on which predicate coexistence can be calculated by the DB2 software.

Optimized RDF stores

If sufficient representative data of the RDF data set is already available, then a more optimized schema can be created for the Direct Primary and Reverse Primary tables. This optimized schema is achieved by exploiting the fact that RDF predicates correlate. For example age and social security number coexist as predicates of Person, and headquarters and revenue coexist as predicates of Company, but age and revenue never occur together in any entity.

Create a optimized RDF store when you have existing or sample data for the RDF data set on which DB2 will calculate predicate correlation to assign predicates to columns intelligently.

Advantages of optimized RDF stores

Predicate correlation is used to drastically, and in many cases completely, remove the randomness of hashing used in default stores. So, in default stores predicate collisions can occur because of lack of knowledge of predicate correlation and this can cause more rows to used in the table than actually required. Extra rows could cause joins between tables to be less efficient than they need be.

Indexing predicates can be more easily achieved since mostly a given predicate can be confined to one single column. Also predicates that don't coexist can be assigned to a single column, allowing a single DB2 index to index multiple predicates.