Appendix – Application Development Environments
This appendix presents a summary of some application development environments for distribution of RDF data. More detailed information about each of the products can be obtained through the links supplied in each summary.
A.1 D2RQ
Nowadays, large amounts of data are stored in relational databases. They serve as feeders for many websites that display the data from these databases through web pages. The D2RQ platform [92] enables relational databases to be accessed as virtual RDF graphs, in reading mode, without the need to replicate this database for a native RDF storage system.
The method uses a language [93] to map the data contained in the tables of the relational database for a set of RDF triples. In Section 4.1, which introduced the concepts of RDF, the cells of a table were identified as triples. The tables themselves were a set of triples of resources of a certain type. The language of the D2RQ platform applies this idea in a sophisticated manner, enabling the set of triples to be precisely defined.
To illustrate this mapping, let us take a very simple example with three tables (figures A.1 and A.3) from a database about a conference:
• Paper – information about published articles.
• Person – information about the people.
• Rel_Person_Paper – Relations of the authors of an article.
PaperId | Title | ... |
---|---|---|
1 | Trusting Information Sources Citizen at a Time | |
... | ... | ... |
PerId | Name | ... |
---|---|---|
1 | Yolanda Gil | |
2 | Varun Ratnakar | ... |
... |
PersonId | PaperId |
---|---|
1 | 1 |
2 | 1 |
... | ... |
Figure A.4 presents an excerpt of the mapping of the relational tables for the RDF triples:
• map:Database1
Defines the connection with the database.
• map:PaperClassMap
Defines an RDF class, defining what URI pattern should be used to access a resource of this type. In the case of the example, the triples of a resource associated with a person who has the identifier “1” in the table “Person” will be accessed by the URI “http://www.conference.org/conf2004/paper#Paper1”.
• map:paperTitle
Defines a property of a class, associating a column of a table with a particular property of a certain vocabulary. In the case of the example, the column "Title" of the table "Paper" is related to the property "title" of the Dublin Core vocabulary.
• map:authorName
Also defines a property of a class, but in this case the column is identified through combining tables. In the case of the example, to know the names of the authors of an article, the relation table "Rel_Person_Paper" must be used to identify the value of the property. For example, article “1” has two authors, which would result in the creation of two triples with the property “dc:creator” from the Dublin Core vocabulary:
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .
@prefix map: <file:///Users/d2r/example.ttl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
map:Database1 a d2rq:Database;
d2rq:jdbcDSN "jdbc:mysql://localhost/iswc" ;
d2rq:jdbcDriver "com.mysql.jdbc.Driver" ;
d2rq:username "user ;
d2rq:password "password" .
map:PaperClassMap a d2rq:ClassMap;
map:PaperClassMap a d2rq:ClassMap;
d2rq:uriPattern
"http://www.conference.org/conf2004/paper#Paper@@Paper.PaperID@@" ;
d2rq:class :Paper ;
d2rq:dataStorage map:Database1 .
map:paperTitle a d2rq:PropertyBridge ;
d2rq:belongsToClassMap map:Paper ;
d2rq:column "Paper.Title" ;
d2rq:property dc:title .
map:authorName a d2rq:PropertyBridge ;
d2rq:belongsToClassMap map:Paper ;
d2rq:join "Paper.PaperID <= Rel_Person_Paper.PaperID" ;
d2rq:join "Rel_Person_Paper.PersonID => Person.PerID" ;
d2rq:datatype xsd:string ;
d2rq:property dc:creator .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
http://www.conference.org/conf2004/paper#Paper1
dc:title "Trusting Information Sources One Citizen at a Time" ;
dc:creator "Yolanda Gil" ;
dc:creator "Varun Ratnakar"
In addition to access to resources through URI patterns defined in the relational database mapping, the D2RQ platform also provides a SPARQL endpoint where queries can be made against the RDF graph.
A.2 Virtuoso
OpenLink Virtuoso [94] is a universal server, a combination of a web application server and an object-relational database management system (ORDBMS). Its architecture enables persistence of data in relational formats, RDF, XML, text, documents, Linked Data, etc. It also functions as a web application server, as well as a host for web services.
Through an ODBC or JDBC layer, Virtuoso connects to most of the most popular commercial relational database platforms, including Oracle, SQL Server, Progress, DB2, Sybase, CA-Ingres and Informix. One of the main features of Virtuoso is the Sponger, which enables transforming non-RDF data into RDF data at runtime, similar to the virtual RDF graph generated by D2RQ. Its goal is to use non-RDF data sources from the Web as input (HTML pages, pages with embedded microformats, data from web services, web APIs, etc.), and create an RDF graph as output. This allows non-RDF data sources to be displayed as Linked Data on the Web. For each type of non-RDF data, it is possible to install a cartridge that extracts and transforms the data into RDF. The product comes with many built-in cartridges, but the user can build a cartridge for a specific data format.
Similar to D2RQ, Virtuoso has a language for mapping of relational data to RDF [95]. The language is an extension of SPARQL query language [96] and permits declarative mapping of tables, columns, lines, attributes, relations and instances defined by RDF schema or OWL ontologies. Virtuoso also accepts mapping of relational data to RDF [97] using R2RML [98], a language recommended by W3C to express personalized mappings of relational databases for RDF datasets. Furthermore, Virtuoso is a Linked Data server that provides a SPARQL endpoint and enables the configuration of URI schema for access to resources, through mapping URIs for SPARQL queries against RDF graphs.
A.3 Sesame
Sesame [99] is a Java framework (open source) for storing, inferencing or querying RDF data. It is extensible and configurable in relation to storage mechanisms, inference machines, RDF file formats, query language and query result formats. Sesame is accessed via an API that can be connected to all the main RDF storage solutions, and an HTTP RESTful interface that supports the SPARQL protocol. Sesame can also be used with the development environment Eclipse [100] to develop web applications with support libraries for manipulating RDF data.
Sesame has two main communication interfaces:
• Sail API (storage and inference layer)
This is a low-level system interface API for RDF tripe stores and inference machines. It provides a way to abstract storage details and enables the use of different types of storage and inference. Thus, different open source and commercial triple banks that implement Sail API can be coupled to Sesame.
• API Repository
This is a high-level API used in application codes. It provides multiple methods for file uploads, queries and data extraction and manipulation. The repositories can be local or remote. Local repositories are located in the Java virtual machine itself. Remote repositories are used according to the client-server model, where the application communicates with a Sesame server. The same interface is used in either mode, so that applications can be developed transparently for the two types of repositories.
Figure A.6 presents part of a Java code where triples are included in a repository. URIs are created for two resources, with two triples generated for each.
import org.openrdf.model.vocabulary.RDF;
import org.openrdf.model.vocabulary.RDFS;
import org.openrdf.model.vocabulary.FOAF;
...
Repository rep = new SailRepository(new MemoryStore());
rep.initialize();
ValueFactory f = rep.getValueFactory();
URI alice = f.createURI("http://example.org/people/alice");
URI bob = f.createURI("http://example.org/people/bob");
Literal bobsName = f.createLiteral("Bob");
Literal alicesName = f.createLiteral("Alice");
try {
RepositoryConnection con = rep.getConnection();
try {
con.add(alice, RDF.TYPE, FOAF.PERSON);
con.add(alice, FOAF.NAME, alicesName);
con.add(bob, RDF.TYPE, FOAF.PERSON);
con.add(bob, FOAF.NAME, bobsName);
}
finally {
con.close();
}
}
A.4 Jena-Fuseki
Jena [101] is a Java framework (open source) for developing Semantic Web and Linked Data applications. It provides an API to extract and insert RDF graph data, represented as models that can be fed with data through files, databases, URLs, etc. These models can also be queried via SPARQL. Jena supports specific OWL ontologies with multiple internal reasoners, in addition to being able to use the Pellet reasoner [102].
Figures A.7 and A.8 present excerpts of code for inserting a triple into a model and loading data into a model from the reading of a file containing RDF data defined, for example, in XML.
static String personURI = "http://somewhere/JohnSmith";
static String fullName = "John Smith";
Model model = ModelFactory.createDefaultModel();
Resource johnSmith = model.createResource(personURI);
johnSmith.addProperty(VCARD.FN, fullName);
Model model = ModelFactory.createDefaultModel();
InputStream in = FileManager.get().open( inputFileName );
if (in == null) {
throw new IllegalArgumentException(
"File: " + inputFileName + " not found");
}
model.read(in, null);
Jena has a component called TDB (triple store) that is used for the persistence of RDF graphs, via a specific API. Access via SPARQL is provided by Fuseki, which is a server that supports the SPARQL protocol over HTTP.
A.5 PublishMyData
PublisMyData [103] is a hosting service for Linked Data provided by the company Swirrl [104], located in the United Kingdom. It offers a set of features and configurable methods for displaying data, intended to be a more appealing way to consume Linked Data. It also provides access interfaces for developers to facilitate the creation of applications about the data.
The advantage of a service like this, as is generally the case with third-party services, is that there is no actual installation with all the costs involving infrastructure and, especially, specialized personnel to configure and maintain the systems.
The service is used, for example, by the city of Hampshire in the United Kingdom. Figure A.9 presents the first page of the dataset catalog of the Hampshire Hub [105].
Figure A.10 provides a view of the “COS emissions reduction” dataset in the form of a spreadsheet. Information is also presented about the dataset (Figure A.11), such as the data publisher and license.
A.6 Epimorphics Linked Data Publishing Platform
The Epimorphics Linked Data Publishing Platform [106] includes a triple store and SPARQL endpoint, plus access to RDF resources through the Linked Data API. The platform provides a fully hosted and managed service (based on Amazon Web Services, for example) for publishing linked data; alternatively, it can be installed in a client's own infrastructure. Each instance of the platform is run on a cluster of dedicated machines for each client.
The platform provides:
• An interface along the lines of Linked Data API, providing access to data in different formats, whether for use by developers or by the general public, in the form of web pages.
• A triple store for storing RDF data.
• A SPARQL endpoint.
• An upload manager, to enable clients to load their own data.
The platform is used by various government websites in the United Kingdom, including environment.data.gov.uk, landregistry.data.gov.uk and others.
The following figures are taken from the environmental data website of the government of the United Kingdom . Figure A.12 presents the homepage of the website. Figure A.13 presents a dataset on water quality in different regions of the UK. This page lists different information such as applications developed using the dataset (Figure A.14) and information about the API for direct access to the data (Figure A.15). The API provides information on the URI patterns for access to the resources. Figure A.16 presents the triples of a resource recovered by the URI http://environment.data.gov.uk/doc/bathing-water/ukl1702-36800 where ukl1702-36800 is the identifier of the district “Neath Port Talbot” located in the region of “Wales.” By selecting the link Neath Port Talbot information stored in the mapping database of the United Kingdom, “Ordnance Survey” [107], is recovered about the district (Figure A.17).
A.7 Triple Stores
Triple stores are management systems for modeled data using RDF. Unlike relational database management systems, which store data in relations (or tables) and are queried using SQL, triple stores store RDF triples and are queried using SPARQL. In addition, a basic characteristic of many triple stores is the ability to make inferences. There are two main types of triple stores: those that store triples directly in the store and those that store them in relational databases and provide an RDF management layer. Certain frameworks, such as Sesame, have an interface (Sail API) that lets an installation configure different triple stores. Following is a list of some of the main triple stores at the moment:
• Virtuoso
It has a native triple store and is currently one of the most used Linked Data platforms; for example, it is used by DBpedia. It has two versions: a commercial one and an open source one that has a limited number of features and lower performance. It provides a limited set of inferences.
• Sesame
Java framework with a native triple store, which enables configuring other triple stores that use Sail API. It does not provide, in native form, a sophisticated level of inferences.
• Jena TDB [108]
Java framework with a native triple store, with different internal reasoners, besides being able to configure external reasoners, such as Pellet.
• GraphDB [109]
Previously called OWLIN, it is currently one of the most used triple stores, which enables inferences in OWL ontologies and comes in three versions, one of which is free. GraphDB provides support to Sail API of Sesame.
• 4store [110]
It is free software whose main strengths are performance, scalability and stability. It has a GNU General Public License. It does not provide inferences. 4store offers support to Sail API of Sesame.
• Bigdata [111]
Graph database, written in Java, with high performance and scalability. The platform offers support to the RDF and SPARQL data model, including queries, updates and basic federated queries. It provides a limited set of inferences. It has two types of licenses: commercial and GNU GPLv2. Bigdata gives support to Sail API of Sesame.
A.8 Libraries
This section lists some libraries that manipulate RDF data and SPARQL queries, for use in different programming language development environments:
• RDFLib [112]
Python library with different parsers and serialization formats, including Turtle, RDF/XML, RDFa, Microdata and JSON-LD. Persistent in-memory storage using Oracle Berkeley DB. Supports SPARQL queries and updates.
• Redland [113]
C library with APIs for manipulating RDF graphs, triples, URIs and literals. Persistent in-memory storage using Oracle Berkeley DB, MySQL 3-5, PostgreSQL, Virtuoso and SQLite, among others. Supports multiple syntaxes for reading and writing RDF as RDF/XML, N-Triples and Turtle, etc., through the Raptor RDF library [114]. Querying with SPARQL and RDQL using the Rasqal RDF query library [115].
• dotNetRDF [116]
Library for .Net with an API for RDF and SPARQL. Support for various backing stores, as well as an API that allows third party stores to be plugged in, such as Virtuoso, 4store, Jena-Fuseki, etc.
• EasyRDF [117]
PHP library with different parsers and serialization formats, including Turtle, RDF/JSON and N-Triples. It comes with a number of examples and provides support for viewing graphs using GraphViz [118]. Allows SPARQL queries.
• Perl RDF [119]
Perl library with an API for RDF storage, parsing and serializations. It produces a set of classes to represent basic RDF objects.
• rdfQuery [120]
JavaScript library that can be used to parse RDFa embedded within a page, query over the facts it contains, and reason to produce more facts. Depends on the jQuery library.