TU Wien:Einführung in Semantic Systems VU (Tjoa)/Possible Exam Questions
“ | This collection of question aims to support students during the preparation for the exam. Students are expected to provide answers based on the material discussed during the lectures and presented in the lecture slides. Additional information collected from third-party materials is encouraged but not mandatory. Please note that questions listed here might appear in a slightly modified form in the exam. Good luck with the preparation! | ” |
What is the “Semantic Web”? Refer to the definition given by Tim Berners-Lee in the Scientific American journal.[Bearbeiten | Quelltext bearbeiten]
According to Tim Berners-Lee, the semantic web is an extension of the current web (1) in which information is given well-defined meaning (2), better enabling computers and people to work in cooperation (3).
- (What) The Semantic Web will gradually evolve out of the existing Web, it is not a competition to the current WWW.
- (How) Represent Web content in a form that is more easily machine-processable.
- (Why) An open platform allowing information to be shared and processed.
[SemSys1, 19/20, p.26]
Provide two of the definitions of a knowledge graph discussed during the lecture.[Bearbeiten | Quelltext bearbeiten]
- A knowledge graph mainly describes real world entities and their interrelations, organized in a graph, defines possible classes and relations of entities in a schema, allows for potentially interrelating arbitrary entities with each other and covers various topical domains.
- Knowledge graphs are large networks of entities, their semantic types, properties, and relationships between entities.
- Knowledge graphs could be envisaged as a network of all kind things which are relevant to a specific domain or an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets.
- We define a Knowledge Graph as an RDF graph. An RDF graph consists of a set of RDF triples where each RDF triple (s,p,o) is an ordered set of the following RDF terms: a subject s U B, a predicate p U, and an object U B L. An RDF term is either a URI u U, a blank node b B, or a literal l L.
- [...] systems exist, [...], which use a variety of techniques to extract new knowledge, in the form of facts, from the web. These facts are interrelated, and hence, recently this extracted knowledge has been referred to as a knowledge graph.
- A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.
[SymSys1, 19/20, p.15]
Give examples of two companies that use knowledge graphs and explain how they make use of these structures as part of their use cases.[Bearbeiten | Quelltext bearbeiten]
- Google uses knowledge graphs to augment search results with structured information (e.g. if you search for TU Wien, you will be given a description as well as facts like the address, the phone number, etc. as well).
- Springer uses knowledge graphs to build SciGraph, a citations and references platform that links articles, persons, organizations, conferences, etc. together.
- Thomson Reuter's Financial knowledge graph
- BabelNet: lexical, multilingual knowledge graph
[SemSys1, 19/20, p.6ff]
What are the key enabling technologies for the semantic web?[Bearbeiten | Quelltext bearbeiten]
- One or more standard vocabularies (ontologies) which are capturing semantics so that search enginges, producers and consumers all speak the same language.
- A standard syntax (XML, RDF, RDF Schema, OWL) so that meta-data can be recognized as such.
- Lots of resources with attached meta-data.
[SemSys1, 19/20, p.28]
Define an ontology (use the definition of Studer from 1998).[Bearbeiten | Quelltext bearbeiten]
Studer (1998): "Formal, explicit specification of a shared conceptualization".
- Formal = machine readable
- explicit specification = Concepts, properties, functions, axioms are explicitly defined
- shared = shared by a community
- conceptualization = abstract model of some phenomena in the world
[SemSysL2, 19/20, p.8]
Which are the three main categories of ontologies based on their expressivity?[Bearbeiten | Quelltext bearbeiten]
- Lightweight Ontologies = Controlled Vocabulary (List of terms), Glossary, Thesauri; "friend of a friend"
- Taxonomies = Ontology with (In)formal IS-A-Hierarchy, Formal instance
- Heavyweight Ontologies = Ontologies with value restrictions, general logic constraints, etc.
[SemSysL2, 19/20, p.9]
What is a taxonomy? Provide a definition and an example.[Bearbeiten | Quelltext bearbeiten]
A taxonomy is a controlled vocabulary organized into a hierarchical structure.
Example:
- A CONCEPT/CLASS represents the set of all entities of that type. (
[ ]
) - A SUBSUMPTION relation points from more specific to more generic concepts. (
--->
)
+---------------+ | | | LivingThing | | | +------^----^---+ | | +---------+ | +---------+ +-------+-------+ +------>+ Animal +<-----+ | Plant | | +---^-----+ | +-^--^--+ | | | | | +----+----+ +---+-----+ +--------+ +-----+ +------+ |Carnivore| |Herbivore| |Omnivore| |Tree| |Grass| +---------+ +---------+ +--------+ +----+ +-----+ ^ ^ ^ | | | +------------+-----------+ | SIBLINGS share parent concept
[SemSysL2, 19/20, p.13]
What are the main elements of an ontology? Give a few examples for a domain of your choice.[Bearbeiten | Quelltext bearbeiten]
An ontology is a taxonomy extended with other relations and further constraints.
- Concepts: Denote the main concepts of the domain.
- Carnivore
- Animal
- Pump
- Engine
- Concept hierarchy: denotes specialization/generalization
- Carnivore is a kind of Animal
- Relations between classes
- Carnivore eats Animal
- Restrictions on relations (Type, cardinality)
- Any Animal has at least one BodyPart
- Instances: denote concrete entities in the domain
- Tom, Jerry
[SemSysL2, 19/20, p.20]
What are the main stages of the ontology engineering methodology proposed by Noy & McGuinness? Enumerate the stages and explain each of them briefly.[Bearbeiten | Quelltext bearbeiten]
1. Determine Scope
- What is included in the ontology should be determined by the uses of the ontology and future extensions that are already anticipated. Basic questions to be answered at this stage:
- What is the domain that the ontology will cover? For what are we going to use the ontology? For what types of questions should the ontology provide answers? Who will use and maintain the ontology?
2. Consider Reuse
- Reuse ontologies to save engineering effort and interact with the tools that use the other ontologies. Further, usage of ontologies that have been validated through use in applications.
3. Enumerate Terms
- Write down a list of all the relevant terms that are expected to appear in the ontology. What are the terms we need to talk about? Nouns form the basis for class names. What are the properties of these terms? Verbs (or verb phrases) form the basis for property names.
4. Define Classes and Taxonomy
- A class is a concept in the domain (a class of wines, a class of wineries, a class of red wines). A class is a collection of elements with similar properties. Instances of classes: a glass of California wine you'll have for lunch.
Relevant terms must be organized in a taxonomic hierarchy. If A is subclass of B, then every instance of A must also be an instance of B. Example: Red wines is a subclass of Wine.
5. Define Properties
- Properties in a class definition:
- - describe attributes of instances of the class (each wine has color, sugar content, producer, etc.)
- - describe relations to other classes (Wine has maker Winery).
- If A is a subclass of B, every property statement that holds for instances of B must also apply to instances of A. Properties shared in a hierarchy should be specified in the highest possible class
6. Define Constraints
- Define cardinalities ("each wine is made from at least one grape"), required values (unversal restriction, existential restriction), property characteristics (symmetry, transitivity, inverse properties, functional values).
7. Create Instances
- Create an instance of a class. The class becomes a direct type of the instance. Any superclass of the direct type is a type of the instance. Assign property values for the instance frame. Property values should conform to the constraints. Number of instance >> number of classes.
8. Check for Anomalies
- Validate ontology models (consistency checking, satisfiability), derive additoinal knowledge.
[SemSysL2, 19/20, p.32ff]
What is the “AAA Slogan”? Which two important assumptions for the Semantic Web Stem from this slogan? Explain each briefly.[Bearbeiten | Quelltext bearbeiten]
On the Web “Anyone can say Anything about Any topic”: AAA Slogan
- Non-unique Naming Assumption: The same entity could be known by more than one name. E.g. PersonA can be the same instance as PersonB. Disjointness needs to be specified explicitly.
- Open World Assumption: "Missing information is not evaluated as negative information!" E.g. likes(PersonA, DrinkB) -> PersonA may also like other drinks.
[SemSysL2, 19/20, p.29]
Define and give an example for a universal restriction.[Bearbeiten | Quelltext bearbeiten]
Universal restriction (all values from): If a property is declared for an instance, all its values must be of a certain type.
Example: Wines can only be made in Wineries.
[SemSysL2, 19/20, p.42]
Define and give an example for an existential restriction.[Bearbeiten | Quelltext bearbeiten]
Existential restriction (some values from): There must be at least one property with that value
Example: Wines are located in regions.
[SemSysL2, 19/20, p.42]
Name and give examples of at least three property characteristics that can be specified in ontologies.[Bearbeiten | Quelltext bearbeiten]
- Symmetry: If P(x, y) => P(y, x). Example: marriedTo, friendOf, adjacentRegion
- Transitivity: If P(x, y) and P(y, z) => P(x, z). Example: locatedIn, hasAncestor
- Inverse properties: If P(x, y) => InvP(y, x) holds. Example: hasParent(John, Bob) => hasChild(Bob, John).
- Functional values: Can have only one (unique) value y for each instance x. I.e. there cannot be two distinct values y1 and y2 such that the pairs (x, y1) and (x, y2) are both instances of this property. E.g. hasHusband.
[SemSysL2, 19/20, p.43]
What are three types of reasoning tasks that can be performed on ontologies?[Bearbeiten | Quelltext bearbeiten]
- Consistency checking: Checking the correctness of the ontology model (e.g. that the model described by the ontology is free of logical contradictions)
- Class inferences: Discovering relation between classes that were not explicitly declared.
- Instance inferences: Discovering class membership of instances.
[SemSysL2, 19/20, pp.49 - 55]
Define the notion of Blank Nodes in RDF and give one modelling example where they are useful.[Bearbeiten | Quelltext bearbeiten]
Blank nodes represent unnamed resources and have no URI but Blank node identifiers which have significance only within the triples representing a single graph.
Used widely, with varying intentions:
- Make statements about resources that do not have URIs (but that are described in terms of relationships with other resources that do)
- group related information
- represent n-ary relationships: address as a structure consisting of separate street, city, state, and postal code values.
[SemSysL3, 19/20, p.31ff]
What does the notion of Reification mean? How can reification be realized in RDF? Give an example.[Bearbeiten | Quelltext bearbeiten]
Used for making statements about statements to:
- model data provenance
- formalize statements about reliability and trust
- define metadata about statements
In RDF:
- rdf:subject: the described resource
- rdf:predicate: the original property
- rdf:object: the value of the property
Example: Sherlock Holmes supposes that the gardner has killed the butler.
+-----------------------------------+ rdf:Statement |the gardner has killed the butler. | +----^-------------^-----------^----+ | | | rdf:subject rdf:predicate rdf:object
[SemSysL3, 19/20, p.40ff]
What are the three main knowledge representation languages used in the Semantic Web? Discuss how they differ in terms of their knowledge representation capabilities.[Bearbeiten | Quelltext bearbeiten]
[SemSysL3, 19/20, p.xx] ??? idk i just took it from here https://stexx.files.wordpress.com/2010/07/termpaper.pdf ??? Different OWL versions?
I would say, that the 3 main knowledge representation languages are: OWL, RDFS and SKOS. All based on the yellow brick of this picture https://www.researchgate.net/profile/Bo_Ferri/publication/215576487/figure/fig3/AS:639876361879562@1529569821653/The-common-layered-Semantic-Web-technology-stack-a-modification-of-Now09-see-also.png, which is also present in the SemSysL3.
XML
- used to add arbitrary structure to the documents - everyone create own tags that may surround a portion of the content, but says nothing about what the structure means - XML is not a knowledge representation language, but the syntax is used in many knowledge representation languages, e.g. RDF and OWL
RDF
- was developed with the motivation to provide web meta data and open information models, to get new information by combining data from several applications and to enable automated processing of web information - foundation layer of the Semantic Web - semantics are encoded in sets of triples, where each triple consists of a subject, a predicate or property and an object
OWL
- is a knowledge representation language - OWL provides reasoning methods and additional vocabulary: relations between classes, cardinality, equality, richer typing of properties, characteristics of properties, and enumerated classes.
Describe the Semantic Web Technology Stack by (1) identifying the main concepts and abstractions covered by the stack as well as (2) naming the concrete specifications that correspond to each of these abstractions. (It is sufficient to focus on those abstractions and solutions that were covered in the lecture).[Bearbeiten | Quelltext bearbeiten]
Concepts & Abstractions | Specifications |
---|---|
The web platform | URI/IRI, HTTP, TCP/IP |
Formats | XML, Turtle |
Information Exchange | RDF |
Models | OWL, RDFS |
QUERY | SPARQL |
[SemSysL3, 19/20, p.4ff]
Describe the typical stages followed with the (Google) OpenRefine tool for creating RDF based data from tabular data. Name and briefly describe each stage.[Bearbeiten | Quelltext bearbeiten]
1. Import messy input data, transform it into a table, and clean it:
- explore data, delete/ split columns, modify items with GREL (powerful expression language).
2. Apply entity reconciliation to interlink with existing data sets.
- Typical steps:
- Define a reconciliation service.
- Select specific types to reconcile against.
- Start reconciling a column against the service
3. Define the structure of the RDF output.
- An RDF skeleton defines the structure of the RDF triples that are exported
4. Export the data into some RDF syntax.
[SemSysL4, 19/20, p.12ff]
Describe the process of transforming a database into an RDF dataset by using R2RML. Additionally, describe the mandatory and optional components of an R2RML mapping definition.[Bearbeiten | Quelltext bearbeiten]
Process:
- Import ontology and database.
- Write R2RML mapping file: It describes how the data should be transformed to RDF.
- Use mapping Engine to generate RDF file.
Direct mapping as R2RML:
- Declare TripleMapping that has following parts:
- logicalTable: what is being mapped (Mandatory): corresponds to the table name.
- subjectMap: how to generate the subject URI (Mandatory). Using a template.
- predicateObjectMap: how to generate the predicate and object (optional).
[SemSysL4, 19/20, p.22-24ff]:
Define the term “Semantic Annotation”. Provide also the formal model of a semantic annotation.[Bearbeiten | Quelltext bearbeiten]
Semantic annotation is the process of attaching additional information to various concepts (e.g. people, things, places, organizations etc) in a given text or any other content
establishes a (typed) relation between the annotated data and the annotating data
PREFIX dbr: <http://dbpedia.org/resource/> . PREFIX dbo: <http://dbpedia.org/ontology/> .
:9787532717071 dbo:author dbr:Stephen King .
Formal Model:
○ An annotation A is a tuple (as, ap, ao, ac), where o as is the subject of the annotation (the annotated data) (:9787532717071) o ao is the object of the annotation (the annotating data) (dbr:Stephen King) o ap is the predicate (the annotation relation) that defines the type of relationship between as and ao (dbo:author) o ac is the context in which the annotation is made. (created @data, created @by)
[SemSysL4, 19/20, p.32ff]
What is the typical structure of a SPARQL query? Name and briefly describe the role of each part of a SPARQL query.[Bearbeiten | Quelltext bearbeiten]
Prefix declarations, for abbreviating URIs
- PREFIX dbpedia: http://dbpedia.org/resource/
Result clause, identifying what information to return from the query
- SELECT/ASK/CONSTRUCT ...
Dataset definitions, stating what RDF graph(s) are being queried
Query pattern, specifying what to query for in the underlying dataset
- WHERE {...}
Query modifiers, slicing, ordering, rearranging query results
- ORDER BY ...
[SemSysL5, 19/20, p.10ff]
What are the four query forms in SPARQL and what is their result type?[Bearbeiten | Quelltext bearbeiten]
1. SELECT
- returns variables and their bindings directly
2. ASK
- tests whether or not a query pattern has a solution.
- Returns yes/no
3. DESCRIBE
- returns a single RDF graph containing RDF data about resource
4. CONSTRUCT
- returns a single RDF graph specified by a graph template
[SemSysL5, 19/20, p.18ff]
For two RDF entailment patterns of your choice (1) describe the pattern and (2) exemplify how querying is affected when reasoning in terms of that pattern in comparison to the case when no reasoning is enabled.[Bearbeiten | Quelltext bearbeiten]
1. rdfs:subClassOf:
xxx rdfs:subClassOf yyy . + zzz rdf:type xxx . => zzz rdf:type yyy .
Example:
Schema: | Query: |
---|---|
mo:MusicGroup rdfs:subClassOf mo:MusicArtist . dbr:The_Beatles rdf:type mo:MusicGroup . |
SELECT ?x WHERE {?x a mo:MusicArtist.} |
Result set with inference: | Result set without inference: |
---|---|
?x dbr:The_Beatles |
?x |
2. rdfs:subPropertyOf: aaa rdfs:subPropertyOf bbb . + yyy aaa zzz . => yyy bbb zzz .
Example:
Schema: | Query: |
---|---|
mo:singer rdfs:subPropertyOf mo:performer . dbr:Yesterday mo:singer dbr:Paul_McCartney . dbr:Yesterday mo:performer dbr:John_Lennon . dbr:Yesterday mo:performer dbr:Ringo_Starr . dbr:Yesterday mo:performer dbr:George_Harrison . |
SELECT ?x WHERE {dbr:Yesterday mo:performer ?x.} |
Result set with inference: | Result set without inference: |
---|---|
?x dbr:Paul_McCartney dbr:John_Lennon dbr:Ringo_Starr dbr:George_Harrison |
?x dbr:John_Lennon dbr:Ringo_Starr dbr:George_Harrison |
[SemSysL5, 19/20, p.49,52]
Describe at least two architectural patterns for KG applications.[Bearbeiten | Quelltext bearbeiten]
[SemSysL6, 19/20, p.5]
Crawling
Crawl or load data in advance
+ : Data is managed in one triple store -> efficient access
- : data may not be up to date
On-the-Fly Dereferencing
URIs are dereferenced at the moment the app requires the data [as opposed to above? Where we crawled them first?]
+ : Retrieves up to date data
- : Performance is affected when the app must dereference many URIs [and what about broken URIs? Or not a problem?]
(Federated) Query
Submit complex queries to a fixed set of data sources
+ : Enables applications to work with current data directly retrieved from the sources
- : Finding optimal query execution plans over a large number of sources is a complex problem
What is a re-publication component in the KG architecture? Why is it needed and what are the available options?[Bearbeiten | Quelltext bearbeiten]
[SemSysL6, 19/20, p.9]
TO DO: why is it needed?
Purpose: Exposes integrated dataset as Linked Data. Is a component within the Data Tier (see p. 4 for overview of components).
Options:
- Via SPARQL endpoints
(e.g., GraphDB, RDF4J Endpoint, OpenLink Virtuoso, Apache Jena Fuseki etc.)
- As RDF dumps
- AS dereferencable Linked Data
- As Linked Data Fragments
- Via APIs
- With built-in mechanisms of CMSs
How to perform data integration at schema and instance level in a KG application?[Bearbeiten | Quelltext bearbeiten]
[SemSysL6, 19/20, p.7]
Consolidates data retrieved from heterogeneous sources
Schema level
– Performs vocabulary mappings in order to translate data into a single unified schema – Links correspond to RDFS properties or OWL property and class axioms
Instance level
– Performs entity resolution via owl:sameAs links – Tools like Silk or OpenRefine can be used to integrate the data in case the data sources do not provide the links
What are conceptual differences between RDF4J and Apache Jena? In what circumstances would you use each of them?[Bearbeiten | Quelltext bearbeiten]
[SemSysL6, 19/20, p.41ff]
Conceptual differences:
- Actions against repositories (RDF4J) vs. handling models directly (Jena)
- RDF4J completely abstracts the storage layer; Jena requires somewhat different models for SDB, TDB,..
Why Jena:
- has built-in support for OWL, OWL Mini, OWL Micro
- has a flexible subsystem to plug in a range of reasoners
- good Serialization/Deserialization for RDF I/O
- documentation is a little more comprehensive
Why RDF4J:
- supports RDF-S reasoning
- good Serialization/Deserialization for RDF I/O
- nice data management tools (console, workbench)
- supports SeRQL
What are the differences/mismatch between Object-Oriented Programming and Ontology Modeling[Bearbeiten | Quelltext bearbeiten]
[SemSysL6, 19/20, p.43]
Impedance mismatch:
- Object oriended (OO): Design decisions based on the operatinal properties of a class [i.e. the methods, "what it can do?"]
- RDF-S / OWL: Decisions based on the structural properties of a class
OWL is property-focused, whereas OOP (e.g. JAVA) is class-focused.
A class structure and relations among classes in an ontology are different from the structure for a similar domain in an OO program.
1:1 class mapping between objects (from OOP) and OWL is problematic, e.g. due to multiple inheritance.