4. Semantic Web

In 2001, Tim Berners-Lee, James Hendler, and Ora Lassila published an article [9] in Scientific American where they launched the basis of the Semantic Web. In the previous sections of this guide we saw how the Web of Documents evolved into a Programmable Web, and the growing supply of applications that execute tasks that manipulate data published in many different forms in this data ecosystem on the Web. The idea of adding semantics to this data is to facilitate the understanding and interoperability of the data in this universe of heterogeneous information, published in many different formats and with different access protocols.

The basic blocks that define the Semantic Web are:

• a standard data model;

• a set of reference vocabularies;

• a standard query protocol.

The Semantic Web seeks to facilitate the communication process between different ecosystem participants, in order to create a common mental model that minimizes the possibility of ambiguity and thereby facilitates the work necessary to developing applications that manipulate different data sources.

The intent of the following sections is to present an overview of the various technologies used in the Semantic Web environment, along with their characteristics in terms of concepts and the role each plays in the ecosystem of data published on the Web. It is not the goal of these sections to outline in detail each of these technologies, which are formally described in well-defined specifications and a number of resources (referenced with links in this guide).

4.1 RDF

The Web, originally designed for human consumption, provides a simple and universal infrastructure for exchanging various types of information. However, it is very difficult for applications to process data displayed in documents without specific information being added, encoded in an appropriate format. The solution is to use metadata to describe the data published on the Web, which helps locate and process information, providing descriptions about the structure, content and other related information (license, intellectual property rights, provenance, etc.). For this, it is necessary to have a database to process metadata, to enable the interoperability of these descriptions between different applications, supported by standards related to the syntax and semantics of the metadata, plus a set of common vocabularies and forms of standardized access.

1. Data Model

A Resources Description Framework (RDF) [10] is a framework for representing information on the Web. RDF allows assertions to be made about resources. A resource can be anything, whether concrete or abstract. A particular company, person, or web page are considered resources. Feelings and colors are also resources.

The format of the assertions is simple. An RDF assertion consists of three elements (a triple), with the following structure: <subject> <predicate> <object>. An RDF assertion expresses a relationship between two resources. The subject and object represent the two resources being related; the predicate represents the nature of this relationship, which is formulated in a directional way (from subject to object) and in RDF is called a property. An object can also be literal, defining a property of a resource.

To illustrate the idea of defining data by means of a set of triples, we will use a known data model: the relational model and its set of related tables [11]. Let us imagine a table that contains information about books (Figure 4.1). Each line of the table has information about a particular book (Figure 4.2). Each of these books is a resource. Each column of the table defines a property related to the book (Figure 4.3). Each cell of the table defines a triple (Figure 4.4).

The table contains information about the resources of the book type:

isbn title author publisher_id pages
9788535912388 Gabriela, Cravo e Canela Jorge Amado 1243 424
... ... ... ... ...
9788501067340 Vidas Secas Graciliano Ramos 3244 176
... ... ... ... ...
9788535921199 Antologia Poética Carlos Drummond de Andrade 1243 344
Figure 4.1 Information about books

The lines represent the resources:

isbn title author publisher_id pages
9788535912388 Gabriela, Cravo e Canela Jorge Amado 1243 424
... ... ... ... ...
9788501067340 Vidas Secas Graciliano Ramos 3244 176
... ... ... ... ...
9788535921199 Antologia Poética Carlos Drummond de Andrade 1243 344
Figura 4.2 - Information about each resource

The columns represent the properties of the resources:

isbn title author publisher_id pages
9788535912388 Gabriela, Cravo e Canela Jorge Amado 1243 424
... ... ... ... ...
9788501067340 Vidas Secas Graciliano Ramos 3244 176
... ... ... ... ...
9788535921199 Antologia Poética Carlos Drummond de Andrade 1243 344
Figure 4.3 Properties of the resources

Each cell of the table defines a property (column) of the resources (line):

isbn title author publisher_id pages
9788535912388 Gabriela, Cravo e Canela Jorge Amado 1243 424
... ... ... ... ...
9788501067340 Vidas Secas Graciliano Ramos 3244 176
... ... ... ... ...
9788535921199 Antologia Poética Carlos Drummond de Andrade 1243 344
Figure 4.4 Triple of a resource

A triple can be represented as a kind of directed graph (an RDF graph), from the subject to the object:

fig4_5_grafo_rdf_da_tripla_de_um_livro.png
Figura 4.5 - Grafo RDF da tripla de um livro

In general terms, there is:

fig4_6_grafo_rdf_generico_de_uma_tripla.png
Figura 4.6 - Grafo RDF genérico de uma tripla

Figure 4.8 presents an RDF graph of the triples corresponding to three properties of the same resource from the table in Figure 4.7:

isbn title author publisher_id pages
9788535912388 Gabriela, Cravo e Canela Jorge Amado 1243 424
... ... ... ... ...
9788501067340 Vidas Secas Graciliano Ramos 3244 176
... ... ... ... ...
9788535921199 Antologia Poética Carlos Drummond de Andrade 1243 344
Figure 4.7 Triples from a resource
fig4_8_grafo_rdf_com_tres_triplas_de_um_mesmo_recurso.png
Figura 4.8 - Grafo RDF com três triplas de um mesmo recurso

In the case of the triples above, they all have a literal value. However, a resource can also establish a relationship with another resource, which in a scheme of relational tables would be represented by another table, and the use of foreign keys. In our example, we could have a table of publishing houses (publishers) (Figure 4.9).

id name
... ...
1243 Companhia das Letras
... ...
3244 Grupo Editorial Record
... ...
Figure 4.9 Information about publishing houses (publishers)

A figura 4.10 apresenta o grafo RDF considerando a relação de um livro com o seu publicador.

fig4_10_grafo_rdf_relacionando_recursos.png
Figura 4.10 - Grafo RDF relacionando recursos

Figure 4.11 presents an RDF graph based on two books from the same publisher.

fig4_11_grafo_rdf_de_varios_recursos.png
Figura 4.11 - Grafo RDF de vários recursos

2. URIs

To complete this data model, we need a way to identify each resource and each of the properties in a unique and universal way, in order to achieve a global, not just particular, semantics, something that can be understood not only within a specific company or organization, but in all companies or organizations. The creator of a table within a given organization assigns names to properties in a particular way, such as the "title" property. It could have been defined as "title," "name," "name of the work," "ttl," etc. In theory, each organization will understand the meaning of the property via an internal document of the company, naming the standards or culture of use within the company. However, we know that this often fails to occur, and there may be ambiguity regarding its meaning. Likewise, if we consider each of the resources, they are often identified through identifiers that are unique – primary keys – but their scope is limited to the table in which they are inserted and the database where they are stored. The same ID, for example, "1243," can be used to identify a publisher in a table and a bookstore in another. They are not unique identifiers in universal terms. They are individual identifiers within the tables of a particular database. And even when they are unique in different tables of an organization, they have to be unique in relation to any resource defined by any other entity, and not only within a particular company.

In RDF, the way to identify resources and properties in a unique and universal manner is through the use of URIs, more specifically, HTTP URIs [12]. URIs are more comprehensive than URLs, since they are not necessarily linked to the location of the resource. They have the same format as URLs, but are used to identify things, whereas URLs identify an address for retrieving information or a document. A person, for example, can be identified by a URI (Figure 4.12).

fig4_12_uso_de_uris.png
Figura 4.12 - Uso de URIs

To obtain better-known semantics of properties, we can use reference vocabularies to define properties of specific domains. We will look at some of these vocabularies in Section 6. For example, there is the property "name," which is usually defined using a property from FOAF vocabulary [13], used for defining properties about people. The URI “http://xmlns.com/foaf/0.1/name” identifies the property “name” (Figure 4.13).

fig4_13_exemplo_de_uso_foaf_name.png
Figura 4.13 - Exemplo de uso foaf:name

A very important issue when publishing data is related to the resource identification scheme, the URI scheme. This is an ongoing discussion in the world of researchers and data publishers on the Semantic Web. There are a variety of articles offering guidance about appropriate schemes for defining URIs. Some argue that the identification of a URI should be completely opaque, meaning there should be no information in the URI that could be interpreted in relation to the resource it identifies, something like a numeric primary key of a relational table. The person example in Figure 4.7 illustrates this idea to a certain extent. Others argue that certain schemes should be able to specify some information in the URIs so that users can deduce another related URI without having to resort to some other kind of search. For example, a URI with information about the federal budget may contain in its formation the year related to the information being sought: http://orcamento.dados.gov.br/doc/2013/ItemDespesa. If the user wants information about another year, it would simply be a matter of changing the year in the URI.

When defining the URI naming scheme, it is important to ensure that it is persistent or, at least, will last as long as possible. And if a URI becomes obsolete, a way should be provided to inform users of that fact and explain how to retrieve this information, which at times is available in some other location identified by another URI. There are many schemas that seek to ensure the persistence of URIs, such as the DOI System [14] and PersId Initiative [15].

This topic is quite extensive and it is not within the scope of this guide to address it more fully. For those desiring further information, the following articles, among others, provide important data for making decisions about the URI definition scheme: Cool URIs for the Semantic Web [16], “Cool URIs don't change [17].

3. Serializations

RDF is an abstract data model and can be represented in any manner provided the representation complies with its abstract properties. There are various syntactic representations for the RDF model, some of which are more suitable for machine processing, and others more readable for people.

The most used notations include RDF/XML [18], Turtle [19], N-Triples [20] and JSON-LD [21]. The first notation to be used was RDF/XML, standardized by W3C. Its initial advantage was that its programming languages provided greater support to XML. XML namespaces can also be used to avoid the use of full URIs, which shortens the length of URIs. However, it is quite difficult for humans to read coding in RDF/XML.

In this section, in order to illustrate serializations, an overview will be given of Turtle notation (Terse RDF Triple Language), which is easier for people to read. Once grasped, it is possible to understand the idea of serialization. The details of the syntax of each of the notations are described in the references for each of the specifications. To understand other notations only requires understanding the specific syntax of each, since the abstract concepts of all of them are the same: the concepts of the RDF data model. There will also be an overview of JSON-LD, which has received wide acceptance in the software developer community.

In the Turtle example, the RDF graph from Figure 4.8 will be used. In addition to the FOAF vocabulary, mentioned earlier, we will use three properties from Dublin Core vocabulary [22] to describe the ISBN (International Standard Book Number) and the creator of the work, and to connect the work to the publisher.

Turtle syntax is made up of sentences with the three elements that define an RDF assertion. These sentences end with a punctuation mark "." Figure 4.14 presents the four triples of the novel Gabriela, Cravo e Canela represented in Turtle.

<http://example.org/#gabriela-cravo-canela>
<http://purl.org/dc/elements/1.1/identifier>
"9788535912388"

<http://example.org/#gabriela-cravo-canela>
<http://xmlns.com/foaf/0.1/name>
"Gabriela, Cravo e Canela"

<http://example.org/#gabriela-cravo-canela>
<http://purl.org/dc/elements/1.1/creator>
"Jorge Amado"
Figure 4.14 Triples in Turtle from the novel Gabriela, Cravo e Canela

To shorten the length of the description in Turtle, it is possible to group together definitions of multiple properties of the same resource, inform the URI once of the resource, and use a ";" sign between the different property definitions. (Figure 4.15).

<http://example.org/#gabriela-cravo-canela>
<http://purl.org/dc/elements/1.1/identifier>
"9788535912388";
<http://xmlns.com/foaf/0.1/name>
"Gabriela, Cravo e Canela";
<http://purl.org/dc/elements/1.1/creator>
"Jorge Amado".
Figure 4.15 Abbreviated syntax of multiple properties of the same resource

To make the definition of the triples even more readable, a set of prefixes can be defined, corresponding to the namespaces of the vocabularies used (Figure 4.16).

base <http://example.org/>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
@prefix foaf: <http://xmlns.com/foaf/0.1/>
@prefix dc: <http://purl.org/dc/elements/1.1/>

<#gabriela-cravo-canela>
   dc:identifier "9788535912388";
   foaf:name "Gabriela, Cravo e Canela";
   dc:creator "Jorge Amado";
Figure 4.16 Use of namespaces in Turtle

Figure 4.17 presents the serialization in Turtle of the graph from Figure 4.11.

base <http://example.org/>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
@prefix foaf: <http://xmlns.com/foaf/0.1/>
@prefix dc: <http://purl.org/dc/elements/1.1/>

<#gabriela-cravo-canela>
   dc:identifier "9788535912388";
   foaf:name "Gabriela, Cravo e Canela";
   dc:creator "Jorge Amado";
   dc:publisher <#companhia-das-letras>.

<#antologia-poetica>
   dc:identifier "9788535921199";
   foaf:name "Antologia Poética";
   dc:creator "Carlos Drummond de Andrade";
   dc:publisher <#companhia-das-letras>.

<#companhia-das-letras>
   foaf:name "Companhia das Letras".
Figure 4.17 Serialization in Turtle of the graph from Figure 4.11.

An RDF serialization which is increasingly used in the Semantic Web is JSON-LD (Javascript Object Notation for Linked Data), an RDF triple serialization format meant to be a more readable representation for people, using the JSON model of attribute-value pairs to represent the set of triples.

One of the core elements of JSON-LD is the idea of context. When two people communicate in a shared environment, there is a context of mutual understanding that enables individuals to use abbreviated terms, a unique vocabulary, to communicate more quickly, but without losing accuracy. A JSON-LD context works the same way. It allows two applications to use abbreviated terms, particular terms, to communicate more efficiently, but without losing accuracy. This need was noted by developers of the JSON-LD specification (a W3C recommendation), due to difficulties in understanding long URIs representing RDF resources. In a way, it is a return to a simpler, more particular vocabulary, shared by a community of a particular application domain.

Figures 4.18 to 4.21 illustrate the progression of a representation of three JSON attribute-value pairs, without a universal semantic reference, to the specification of a JSON-LD code, where the same three pairs are associated with a context defining the universal semantics of the terms.

Figure 4.18 presents three properties of a person, in JSON format:

• “name” – name (literal)

• “homepage” – personal page (URL)

• “image” – image (URL)

{
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/",
  "image": "http://manu.sporny.org/images/manu.png"
}
Figure 4.18 Definition of data in JSON

Figure 4.19 replaces the terms (of the community’s particular vocabulary) with the URIs of the properties from the reference vocabularies. For example, “name” is replaced by the URI “http://schema.org/name.” It also states whether the value of the property of the attribute is another resource (@id). The introduction of URIs makes the text less legible.

{
  "http://schema.org/name": "Manu Sporny",
  "http://schema.org/url": { "@id": "http://manu.sporny.org/" },
  "http://schema.org/image": {"@id": "http://manu.sporny.org/images/manu.png"}
}
Figure 4.19 Inclusion of property URIs in JSON code

Figure 4.20 presents the definition of a context where the abbreviated names from Figure 4.18 are associated with the URIs introduced in Figure 4.19. The association of the abbreviations with universal semantic definitions (context) are separate from the definitions of attribute-value pairs. This allows the abbreviations to be reused. The context serves as a mapping between the particular vocabulary and the reference vocabulary.

{
  "@context":
  {
   "name": "http://schema.org/name",
   "image": {
    "@id": "http://schema.org/image",
    "@type": "@id"
   },
   "homepage": {
    "@id": "http://schema.org/url",
    "@type": "@id"
   }
 }
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/",
  "image": "http://manu.sporny.org/images/manu.png"
}
Figure 4.20 Definition of a context in JSON

Contexts can be defined in combination with attribute-value pairs or referenced by URIs. This way, a particular known context of a community can be noted at the beginning of the definition of the data and abbreviated vocabulary can be used. The context of Figure 4.20 could be considered the context for defining a person. Figure 4.21 illustrates how the code would use abbreviations, including a reference to an externally defined context.

{
  "@context": "http://json-ld.org/contexts/person.jsonld",
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/",
  "image": "http://manu.sporny.org/images/manu.png"
}
Figure 4.21 Definition of data in JSON-LD

4.2 RDFS

The RDF data model provides a way to make assertions about resources, but it does not make any suppositions about the semantics of the resources identified by the URIs. There is no builder in RDF that specifies that a resource is a property, or book, or person, etc. In practice, RDF is typically used in combination with vocabularies that provide semantic information about these resources.

Resource Description Framework Schema (RDFS) [23] is a vocabulary that extends RDF and introduces a layer that specifies some characteristics that add semantics to data defined in RDF. RDFS has builders that specify, for example, that certain URIs indicate resource properties, or that certain resources identified by URIs belong to a particular class. Using the example from Figure 4.13, it is possible, through RDFS, to specify that the resource identified by the URI "http://xmlns.com/foaf/0.1/name" is a property. Or that the resource identified by the URI “http://example.org/#gabriela-cravo-canela” belongs to class “Book.”

RDFS uses the notion of class to specify categories that can be used to classify resources. The relation between an instance and its class is indicated by the RDF property "type." With RDFS, class and subclass hierarchies can be created, as well as property and subproperty hierarchies. Type restrictions can be imposed on the subjects and objects of triples, by specifying domains and codomains for each of the types. In the previous example of the books, we could state that the domain of the relation "has_publisher" is of the type "Book" and the codomain is of the type "publisher."

It is very important to note that the idea of types, domains and codomains of a property are different from those in the object-oriented model. In the object-oriented model, a class abstracts the properties and methods from a particular set of instances: If an object is stated as being from a particular type (a class), it is assumed that it has a set of properties associated with that type. In the case of RDFS, the statement of a property does not restrict its use to any resource, regardless of its type. The statement of properties is made separately and independently from the statement of classes. The statement of a domain and codomain only enables making an inference of a possible type of resource, even if this type is not explicitly stated through a triple. On a daily basis, humans do this constantly. Based on certain observed properties of objects, people, etc., humans infer the types to which these resources may belong. It is a common form of inference for human beings.

Going back to RDFS, in the example of the books, let us suppose that classes “Book” and “Publisher” are stated and that the domain of the relation “has_publisher” is from the resources of the class “Book” and that the codomain is from the resources of the class “Publisher.” Even if a triple is not stated explicitly affirming that the resource “http://example.org/#gabriela-cravo-canela” is from the class “Book” and that the resource “http://example.org/#companhia-das-letras” is from the class “Publisher,” if it is stated that there is a "has_publisher" relation between these two resources, the classes of the resources will be inferred by the statement of the domain and codomain of the property. Thus it is possible to infer more triples than those actually stated.

Inference is one of the features introduced by the use of Semantic Web technologies. Statements of information can be shorter when inferences are used. For example, in a set of triples about people, it is not necessary to explicitly state the triples with the relations of the type "uncle," which may be inferred from other relations. RDFS introduces few types of inferences, which are more widely expanded through the use of OWL, as will be seen in Section 4.3.

Another type of inference introduced by RDFS is related to the statement of subclasses and subproperties. If a particular resource is from the type of a particular subclass, it will be inferred that this resource is also from the type of the corresponding class. The same occurs in relation to subproperties.

Figure 4.22 illustrates the introduction of some RDFS builders in the example from Figure 4.17. For the definition of the class “Book,” we use the bibo ontology [24] for defining bibliographic information. The keyword “a” is an abbreviation for the predicate “rdf:type,” which indicates the class of a resource.

{
  base <http://example.org/>
  @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
  @prefix foaf: <http://xmlns.com/foaf/0.1/>
  @prefix dc: <http://purl.org/dc/elements/1.1/>
  @prefix bibo: <http://purl.org/ontology/bibo/>

  <#gabriela-cravo-canela>
   a bibo:Book ;
   dc:identifier "9788535912388";
   foaf:name "Gabriela, Cravo e Canela";
   dc:creator "Jorge Amado";
   dc:publisher <#companhia-das-letras>.

  <#vidas-secas>
   a bibo:Book;
   dc:identifier "9788501067340";
   foaf:name "Vidas Secas";
   dc:creator "Graciliano Ramos";
   dc:publisher <#grupo-record>.

  <#antologia-poetica>
   a bibo:Book;
   dc:identifier "9788535921199";
   foaf:name "Antologia Poética";
   dc:creator "Carlos Drummond de Andrade";
   dc:publisher <#companhia-das-letras>

  <#companhia-das-letras>
   a foaf:Organization;
   foaf:name "Companhia das Letras".

  <#grupo-record>
   a foaf:Organization;
   foaf:name "Grupo Editorial Record".
}
Figure 4.22 Inclusion of information on types of resources (classes)

4.3 OWL

RDF specifies a data model that represents information by means of a set of triples that define resource properties, as well as relations among these various resources. RDFS extends RDF and enables the definition of class hierarchies, property hierarchies and the definition of domains and codomains for properties, thus permitting a first set of restrictions on the defined triples, in addition to inferences that explicitly deduce non-stated triples.

Why define restrictions? One way to understand what semantics are is to consider: what causes different people to draw the same meaning from a certain content by restricting the number of possible interpretations of that content. The ideal is that only one interpretation is possible.

Ontology Web Language [25] is a language that extends RDF and RDFS and offers a much broader range of restrictions on the set of defined triples. Various builders are also provided that enable, among other things, construction of complex classes from other class definitions, and property chaining. One of the main bases of OWL is Description Logics (DLs) [26], a family of knowledge representation languages widely used in ontology modeling. An ontology is a specification of a concept within a particular domain of interest. As its name suggests, DLs are logical and have formal semantics: a precise specification of the meaning. These formal semantics enable humans and computer systems to exchange DL ontologies without their meaning being ambiguous, in addition to making it to possible to use logical deduction to infer additional information from facts explicitly displayed in an ontology.

Ontologies supply the means to model relations between entities and domains of interest. In OWL, there are three types of entities:

• Instances – They represent resources (they are also called individuals).

• Classes – They define sets of instances or individuals.

• Properties – They represent binary relations between two instances (object property) or between an instance and a literal (datatype property).

A specific OWL ontology consists of a set of assertions (axioms), separated into three groups: Tbox (Terminological), Abox (Assertion) and Rbox (Role).

Tbox describes relations between classes. For example, the assertion that the "Person" class is equivalent to the "Human" class means that both classes have the same set of individuals. Abox captures knowledge about individuals, i.e., the classes to which they belong and how they are related to each other. For example, we make an assertion that the individual "Mary" is from the "Person" class, i.e., "Mary" is an instance of the "Person" class. If we combine it with the example from TBox, where the "Person" class is defined as being equivalent to the "Human" class, it can be inferred that "Mary" is also an instance of the "Human" class:

:Person owl:equivalentClass :Human .
:Mary rdf:type :Person .
:Mary rdf:type :Human . (inferência)

In Tbox, the relations between individuals are also defined. For example, we can establish that "Mary" is the wife of "John."

:John :hasWife :Mary .

Rbox contains assertions about properties, i.e., metaproperties such as transitivity, symmetry, etc. These assertions enable inferences to be made about the explicitly defined base of triples:

:hasAncestor rdf:type owl:TransitiveProperty .
:John hasAncestor :Phil .
:Phil hasAncestor :Peter .
:John hasAncestor :Peter . (inferência)

The following points provide a summary of the various builders contained in the OWL language specification:

• Classes are defined in RDFS through the RDF property type:

:Person rdf:type rdfs:Class .
:Woman rdf:type rdfs:Class .
:Woman rdfs:subClassOf :Person .

• The class of a particular instance can be stated explicitly, or it can be inferred, for example, from the definition of subclasses or domain and codomain:

:Mary rdf:type owl:NamedIndividual .
:Mary rdf:type :Woman .
:Mary rdf:type :Person . (inferência)

• It is possible to establish that two names of individuals represent the same individual, that is, the RDF graph of the resource will be the sum of the assertions about each of the individuals:

:Mary owl:sameAs otherOntology:MaryBrown .

• Equivalent classes have the same set of instances:

:Person owl:equivalentClass :Human .
:Mary rdf:type :Person .
:Mary rdf:type :Human . (inferência)

• Disjoint classes indicate that an instance that belongs to one does not belong to the other.

[] rdf:type owl:AllDisjointClasses ;
owl:members ( :Woman :Man ) .

• A property can have a literal value (datatype property) or establish a relationship between two instances (object property).

:hasAge rdf:type owl:DatatypeProperty .
:John :hasAge 52 .
:hasWife rdf:type owl:ObjectProperty .
:John :hasWife :Mary .

• Complex classes can be built from other classes, through intersection, union, complement and enumeration:

:Mother owl:equivalentClass [
  rdf:type owl:Class ;
  owl:intersectionOf ( :Woman :Parent )
] .
:Parent owl:equivalentClass [
  rdf:type owl:Class ;
  owl:unionOf ( :Mother :Father
] .
:ChildlessPerson owl:equivalentClass [
  rdf:type owl:Class ;
  owl:intersectionOf ( :Person [ owl:complementOf
:Parent ] )
] .
:Beatles owl:equivalentClass [
  rdf:type owl:Class ;
  owl:oneOf ( :George :Ringo :John :Paul
] .

• Properties can have a cardinality that indicates the number (maximum, minimum or exact) of triples:

:John rdf:type [
  rdf:type owl:Restriction ;
  owl:maxCardinality "4"^^xsd:nonNegativeInteger ;
  owl:onProperty :hasChild
] .
:John rdf:type [
  rdf:type owl:Restriction ;
  owl:minCardinality "2"^^xsd:nonNegativeInteger ;
  owl:onProperty :hasChild
] .
:John rdf:type [
  rdf:type owl:Restriction ;
  owl:cardinality "5"^^xsd:nonNegativeInteger ;
  owl:onProperty :hasChild
] .

• Properties can be transitive:

:John rdf:type [
  rdf:type owl:Restriction ;
  owl:maxCardinality "4"^^xsd:nonNegativeInteger ;
  owl:onProperty :hasChild
] .
:John rdf:type [
  rdf:type owl:Restriction ;
  owl:minCardinality "2"^^xsd:nonNegativeInteger ;
  owl:onProperty :hasChild
] .
:John rdf:type [
  rdf:type owl:Restriction ;
  owl:cardinality "5"^^xsd:nonNegativeInteger ;
  owl:onProperty :hasChild
] .

• Properties can be inverse:

:hasAncestor rdf:type owl:TransitiveProperty .
:John hasAncestor :Phil .
:Phil hasAncestor :Peter .
:John hasAncestor :Peter . (inferência)

• Properties can be symmetrical (the inverse of the property is the same property, for example, spouse):

:hasParent owl:inverseOf :hasChild .
:John hasChild :Paul .
:Paul hasParent :John . (inferência)

• Properties can be asymmetrical (the inverse property cannot be the same property, for example, a person cannot be the daughter of his/her son):

:hasChild rdf:type owl:AsymmetricProperty .

• Properties can be defined as chains of other properties (grandfather is the father of the father):

:hasGrandparent owl:propertyChainAxiom ( :hasParent :hasParent ) .
:John hasParent :Phil .
:Phil hasParent :Peter .
:Peter hasParent :Paul .
:John hasGrandparent :Peter . (inferência)
:Phil hasGrandparent :Paul . (inferência)

• Properties can be reflexive (a person has himself or herself as a relative):

:hasRelative rdf:type owl:ReflexiveProperty .

• Properties can be irreflexive (a person cannot be father of himself):

:parentOf rdf:type owl:IrreflexiveProperty .

• Properties can be functional (there is only one element in the codomain for an element from the domain, for example, mother):

:hasMother rdf:type owl:FunctionalProperty .

• Properties can be inversely functional (there is only one element in the domain for an element from the codomain, for example, children):

4.4 SPARQL

Data is represented in the Semantic Web using the conceptual RDF data model, together with RDFS and OWL extensions. This data can be stored in a triple store or relational database with an RDF mapping schema, etc.

SPARQL (SPARQL Protocol and Query Language) [27] is the query language of the Semantic Web. We can make an analogy between SPARQL and SQL relational database query language, considering that SPARQL has an adequate syntax to query data represented as a set of RDF triples.

For example, consider a triple store with the following content:

<http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title>
  "RDF Tutorial" .
<http://example.org/book/book2> <http://purl.org/dc/elements/1.1/title>
  "SPARQL Tutorial" .

The query

SELECT ?title
WHERE
{
  <http://example.org/book/book2>
  <http://purl.org/dc/elements/1.1/title>
  ?title .
}

would return:

title
"SPARQL Tutorial"

SPARQL variables start with a “?” and can be defined in any of the three positions of a triple (subject, predicate, object) in the RDF dataset. The triple patterns of the SELECT clause have the same form as normal triples, except that one of the three parts of the triple can be replaced by a variable. The SELECT clause returns a table of variables with values that satisfy the query.

In another example, let us consider a triple store with the following content:

base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .

<#gabriela-cravo-canela>
  dc:identifier "9788535912388" ;
  foaf:name "Gabriela, Cravo e Canela" ;
  dc:creator "Jorge Amado" ;
  dc:publisher <#companhia-das-letras> .

<#vidas-secas>
  dc:identifier "9788501067340" ;
  foaf:name "Vidas Secas" ;
  dc:creator "Graciliano Ramos" ;
  dc:publisher <#grupo-record> .

<#antologia-poetica>
  dc:identifier "9788535921199" ;
  foaf:name "Antologia Poética" ;
  dc:creator "Carlos Drummond de Andrade" ;
  dc:publisher <#companhia-das-letras> .

<#companhia-das-letras>
  foaf:name "Companhia das Letras".

<#grupo-record>
  foaf:name "Grupo Editorial Record" .

The query:

name creator
"Gabriela, Cravo e Canela" "Jorge Amado"
"Antologia Poética" "Carlos Drummond de Andrade

would return:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.org/>
SELECT ?name ?creator
WHERE
{ ?book dc:publisher ex:companhia-das-letras .
  ?book foaf:name ?name .
  ?book dc:creator ?creator .
}

SPARQL allows a number of filters and operators that enable making complex queries to the set of stored triples. Different triple stores provide access points via the Web (URLs), which accept the SPARQL protocol and its query language. These access points are called SPARQL endpoints. A SPARQL endpoint accepts queries and returns the results via HTTP. Generic endpoints execute queries against any RDF dataset with possible access via the Web (specified as parameters). Specific endpoints only execute queries against particular datasets, established by the application. Lists of some SPARQL endpoints on the Web are presented in SPARQLES [28], W3C SPARQL endpoints [29] and Mondeca [30]. generic endpoint is provided by Openlink [31], where it is possible to make queries through specification of the location of the RDF graph that contains the set of triples to be inspected.

The SELECT clause accepts specifying the RDF graph that will be queried, using the FROM keyword. In the following example (Figures 4.23 and 4.24), the homepages of the people known (“foaf:knows”) by Tim Berners-Lee are returned, in the RDF graph located at http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX card: <http://www.w3.org/People/Berners-Lee/card#>
SELECT ?homepage
FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf>
WHERE {
  card:i foaf:knows ?known
  ?known foaf:homepage ?homepage . }
Figure 4.23 Use of FROM in SPARQL

homepage
http://purl.org/net/eric/
http://www.johnseelybrown.com/
...
Figure 4.24 Results of the use of FROM in SPARQL

SPARQL queries do not return sets of triples. The result is a set of tuples (as lines of a table) which can come in different formats, such as HTML, XML, JSON or CSV. This format is specified as one of the parameters of the query. Figure 4.25 presents the results of the previous query in JSON format:

{ "head": { "link": [], "vars": ["homepage"] },
  "results": { "distinct": false, "ordered": true, "bindings": [
  { "homepage": { "type": "uri", "value": "http://purl.org/net/eric/" }},
  { "homepage": { "type": "uri", "value": "http://www.johnseelybrown.com/" }},
... ] } }
Figure 4.25 SPARQL query results in JSON format

In addition to the SELECT command, SPARQL accepts the commands DESCRIBE, CONSTRUCT and ASK. The DESCRIBE command returns a simple RDF graph containing the RDF data for the resources specified in the command. The data returned is determined by the SPARQL query processor and not by the client who made the query. Triples can be returned where the resource appears as a subject, predicate or object, or triples where the resource appears as a subject or object, or triples where the resource appears only as a subject.

The triples of a resource can also be accessed via dereferencing (from the definition of a URI), which is a Linked Data on the Web principle (Section 5.1). The result of dereferencing, like the result of a DESCRIBE command, returns a set of triples of resources. DBpedia contains triples taken from the info boxes of articles in Wikipedia. The dereferencing of the URI in DBpedia that identifies Tim Berners-Lee can be viewed at http://dbpedia.org/page/Tim_Berners-Lee [32]. This page is originally returned in HTML, but its result can be obtained in other formats, such as in JSON [33].

Figure 4.26 shows an example of the DESCRIBE command in a query against the DBpedia endpoint [34]. Figure 4.27 presents part of the results. It can be seen that triples are listed where the resource (“:Tim_Berners-Lee”) appears as subject and object.

PREFIX dbpedia-owl:
DESCRIBE ?timbl WHERE {
  ?timbl dbpedia-owl:alias "TimBL".
}
Figure 4.26 SPARQL DESCRIBE command

s p o
:Tim_Berners-Lee dc:description "British computer scientist, best know as the inventor of the World Wide Web"
:Tim_Berners-Lee dbpedia2:ocupation :Computer_scientist
:Tim_Berners-Lee foaf:name "Sir Tim Berners-Lee"
:Tim_Berners-Lee foaf:name "Tim Berners-Lee"
:Tim_Berners-Lee foaf:name "Berners-Lee, Tim"
:Tim_Berners-Lee foaf:surname "Berners-Lee"
:Tim_Berners-Lee rdfs:label "Tim Berners-Lee"
:Tim_Berners-Lee dbpedia:ontology/birthYear "1955+02:00"
:Tim_Berners-Lee dbpedia2:birthPlace United Kingdom
:Tim_Berners-Lee dbpedia2:nationality "British"
:Tim_Berners-Lee dbpedia:ontology/employer :Massachusetts_Institute_of_Technology
:Tim_Berners-Lee dbpedia:ontology/award :Royal_Society
:Tim_Berners-Lee owl:sameAs <http://eo.dbpedia.org/resource/Tim_Berners-Lee>
...
:Conway_Berners-Lee dbpedia2:children :Tim_Berners-Lee
:ENQUIRE dbpedia2:inventor :Tim_Berners-Lee
:Libwww dbpedia2:author :Tim_Berners-Lee
...
Figure 4.27 Results of the SPARQL DESCRIBE command

Similar to SQL, it is possible to manipulate the set of results returned by SPARQL:

• LIMIT – limits the number of lines

• DISTINCT – removes duplicate lines

• ORDER – orders the result

• OFFSET – permits paging

• filters – applies filters to the values sought in the properties

SPARQL also offers a number of pre-built functions used in the specification of queries, which include logical operators (“!”, “& &”, “||” ), mathematical operators (“+”, “-”, “*”, “/”), comparative operators (“=”, “>”, “<”, ...), test operators (“isLiteral”, isURI”, ...), and string manipulation functions (“STRLEN”, “SUBSTR”, “UCASE”, “CONCAT”, ...).

CONSTRUCT is a command in SPARQL language that enables a set of triples to be built from a search result. Figure 4.28 presents an example of the CONSTRUCT command in a query against the DBpedia endpoint. Figure 4.29 shows the results.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
CONSTRUCT { ?timbl foaf:name ?name }
WHERE {
  ?timbl dbpedia-owl:alias "TimBL" .
  ?timbl foaf:name ?name .
}
Figure 4.28 SPARQL CONSTRUCT command

s p o
:Tim_Berners-Lee foaf:name "Sir Tim Berners-Lee
:Tim_Berners-Lee foaf:name "Tim Berners-Lee
:Tim_Berners-Lee foaf:name "Berners-Lee, Tim
Figure 4.29 SPARQL CONSTRUCT command results

In addition to specifying the query language itself, there are a number of other standards related to SPARQL, defined by W3C, that specify a language for manipulating and updating RDF graphs SPARQL 1.1 Update [35]), federated queries in different RDF graphs SPARQL 1.1 Federated Query [36]), etc.

4.5 Metadata Embedded in Pages

In the previous sections we presented the conceptual RDF data model, with its set of triples of resources identified by URIs, in addition to extensions through RDFS and OWL, which enable the construction of more complex ontologies, with a larger number of restrictions and inference possibilities. We also introduced SPARQL query language for RDF graphs.

These technologies are more directed toward graphs stored on web servers. There is, however, a data universe being semantically described, dispersed in HTML pages, that contains a mixture of content to be consumed by people and content to be consumed by applications. Semantic data is increasingly included in these pages, by web page designers interested in meeting the demands of applications. The inclusion of semantic data in an online store, for example, can help the Google search engine display the search results for a particular product more accurately and objectively.

Section 2.5, on metadata, discussed how it was initially introduced into a web page through the HTML <meta> tag. This metadata is connected to the web page as a whole. It can, for example, provide information about the author of the page, keywords, description, publication date, etc. In the following sections we will present three formats used for including metadata in web pages, which can also be applied in the individual description of the data contained in the pages.

The intent of these sections is to briefly describe these three techniques for embedding data in web pages. It is important to understand that this is one of the ways metadata is defined for building a more semantic Web of Data, as a complement to what was presented in the previous sections.

4.5.1 Microformat

Microformat [37] was the first initiative in the sense of adding extra information to HTML code, enabling specific types of data (entities) to be included, such as people, products, and events. Each of these types has a particular set of properties and specific syntax. For example, a person can have the properties: name, address, company, position, email, etc.

The interpretation of an HTML code, in order to display data to users, ignores any unknown tag from the HTML specification. In general, microformats use class attributes in HTML tags (often or

) to assign names to entities and their properties.

The example in Figures 4.30 and 4.31 shows how to mix information that will be filtered by the browser to display the page and information that will be filtered by applications to understand the content.

<div>
   <img src="www.example.com/bobsmith.jpg" />
   <strong>Bob Smith</strong>
   Senior editor at ACME Reviews
   200 Main St
   Desertville, AZ 12345
</div>
Figure 4.30 HTML code without microformat information

<div class="vcard">
   <img class="photo" src="www.example.com/bobsmith.jpg" />
   <strong class="fn">Bob Smith</strong>
   <span class="title">Senior editor</span> at
   <span class="org">ACME Reviews</span>
   <span class="adr">
     <span class="street-address">200 Main St</span>
     <span class="locality">Desertville</span>,
     <span class="region">AZ</span>
     <span class="postal-code">12345</span>
   </span>
</div>
Figure 4.31 HTML code mixed with microformat information

Following are some of the properties that can be defined for a “person” entity type:

• fn – full name. required

• n – structured name:

• given-name – first name

• additional-name – middle name

• family-name – surname

• title – position

• org – company, organization

• photo – photo, icon, avatar

• email – email

• tel – telephone

• adr – structured address:

• street-address – street

• locality – city

• region – state

• postal-code – postal code

• country-name – country

Each specific microformat has a particular set of properties (vocabulary). Microformats are easy to use due to their simplicity, but offer little extensibility, since they do not have a standard form of representation for vocabularies. Furthermore, it is only possible to specify metadata for a small set of types of data. Microformats2 [38] seeks to develop the concept of microformats and create a common syntax regardless of vocabularies and greater standardization in the nomenclature of entities and properties.

4.5.2 RDFa

Like microdata, RDFa [39] enables metadata to be embedded in web pages, but it uses a more generic syntax for specifying RDF triples, regardless of the datatype to which the resource belongs and the vocabulary used. The information from the triples is specified in the form of attributes within HTML tags:

• vocab – URI of the vocabulary (namespace)

• about, src, href and resource – URI of the resource

• typeof – type of resource

• rel – relation to another resource

• rev – inverse relation to another resource

• property – name of the property (the value of the property is the text that appears between the tags where the attribute is defined)

• content – replaces the value of the property that appears between the tags where the attribute is defined

• datatype – property data type

To illustrate some of these attributes, two examples will be given (with their figures) extracted from the RDFa specification [40] of W3C. In Figure 4.32, triples are defined in relation to two resources. Figures 4.33 and 4.34 present the RDF graphs of these triples and help understand how they are defined by RDFa embedded in the HTML code.

<body vocab="http://purl.org/dc/terms/">
  ...
  <div resource="/alice/posts/trouble_with_bob">
   <h2 property="title">The trouble with Bob</h2>
   <p>Date: <span property="created">2011-09-10</span></p>
   <h3 property="creator">Alice</h3>
   ...
  </div>
  ...
  <div resource="/alice/posts/jos_barbecue">
   <h2 property="title">Jo’s Barbecue</h2>
   <p>Date: <span property="created">2011-09-14</span>


   <h3 property="creator">Eve</h3>
   ...
  </div>
  ...
</body>
Figura 4.32 - RDFa embutido em páginas Web

fig4_33_triplas_embutidas_em_paginas_web_1.png
Figura 4.33 - Triplas embutidas em páginas Web (1)
fig4_34_triplas_embutidas_em_paginas_web_2.png
Figura 4.33 - Triplas embutidas em páginas Web (1)

The example in Figure 4.35 illustrates the use of the “rel” attribute, which defines a relation between two resources, in this case, the relation “foaf:knows” (someone known by a certain person). In the example, the resources that are objects of these triples are of the “foaf:Person” type, and for each of them a “foaf:name” and “foaf:homepage” are defined. What the browser will display for the user will be the links for the homepages of Bob, Eve and Ana.

<div vocab="http://xmlns.com/foaf/0.1/" resource="#me">
  <ul rel="knows">
   <li resource="http://example.com/bob/#me" typeof="Person">
    <a property="homepage" href="http://example.com/bob/">
     <span property="name">Bob</span>
    </a>
   </li>
   <li resource="http://example.com/eve/#me" typeof="Person">
    <a property="homepage" href="http://example.com/eve/">
     <span property="name">Eve</span>
    </a>
   </li>
   <li resource="http://example.com/ana/#me" typeof="Person">
    <a property="homepage" href="http://example.com/ana/">
     <span property="name">Ana</span>
    </a>
   </li>
  </ul>
</div>
Figure 4.35 Example of the relation between resources coded in RDFa

4.5.3 Microdata

Microdata [41] is another more recent way to embed metadata in web pages. Its use has been growing and, for search engines such as Google, it is one of the preferred ways. Although any vocabulary can be used to define its resources, web page designers who use microdata work a great deal with the set of vocabularies defined by Schema.org (Section 6.4).

Like RDFa, microdata syntax does not depend on the types of resources defined or the vocabulary used for its description. The way information is embedded is also through the definition of attributes in HTML tags. The microdata model consists of groups of name-value pairs, known as items. Each item can have a type, a global identifier and a list of name-value pairs. Each name of the name-value pair is known as a property, and each property has one or more values. Each value is an item or a literal. The basic attributes of microdata are:

• itemscope – defines the resource

• itemid – defines a URI for the resource

• itemtype – defines the type of resource (vocabulary)

• itemprop – defines a property of the resource

Figure 4.36 presents an example of microdata for a person, using the Person data type [42] from Schema.org, which has a series of defined properties, such as “name,” “image,”, “jobTitle,” “telephone,” “email,” “colleague,” among others, each with a well-defined semantic. In the example, the "address" property can be a property that points to another resource ("itemscope"), which has another type of data, "PostalAddress."

As in RDF, the microdata model itself does not delve into the merits of the meaning of each of the vocabularies. The syntax for its use as metadata is unique, regardless of vocabularies, and the choice and the proper use of each one, and their properties, requires specific study by the user.

The semantics added to the data are materialized through the use of the syntaxes presented (RDF, RDFS, OWL, microformats, RDFa, microdata) together with specific ontologies developed for specific cases, and reference vocabularies, which are established by culture of use. Some of the most used vocabularies are presented in Section 6.

<div itemscope itemtype="http://schema.org/Person">
  <span itemprop="name">Jane Doe</span>
  <img src="janedoe.jpg" itemprop="image" alt="Photo of Jane Joe"/>
  <span itemprop="jobTitle">Professor</span>
  <div itemprop="address"
   itemscope itemtype="http://schema.org/PostalAddress">
   <span itemprop="streetAddress">
    20341 Whitworth Institute
    405 N. Whitworth
   </span>
   <span itemprop="addressLocality">Seattle</span>,
   <span itemprop="addressRegion">WA</span>
   <span itemprop="postalCode">98052</span>
  </div>
  <span itemprop="telephone">(425) 123-4567</span>
  <a href="mailto:jane-doe@xyz.edu" itemprop="email">
   jane-doe@xyz.edu</a>
  Jane’s home page:
  <a href="http://www.janedoe.com" itemprop="url">janedoe.com</a>
  Graduate students:
  <a href="http://www.xyz.edu/students/alicejones.html"
   itemprop="colleague">Alice Jones</a>
  <a href="http://www.xyz.edu/students/bobsmith.html"
   itemprop="colleague">Bob Smith</a>
</div>
Figure 4.36 Microdata embedded in web pages