TU Wien:Semistrukturierte Daten VU (Woltran)/Zusammenfassung

Aus VoWi
Zur Navigation springen Zur Suche springen

Semi-structured data[Bearbeiten | Quelltext bearbeiten]

  • SSD can be represented as a labeled tree

XML[Bearbeiten | Quelltext bearbeiten]

  • An element consists of:
    • start tag: <element-name>
    • content
      • empty
      • simple content — text
      • element content — one or more elements
      • mixed content — text and elements
    • end tag: </element-name>

Empty elements can be abbreviated as <element-name/>.

  • XML is case-sensitive.
  • XML predefines five entity references: &lt; &amp; &gt; &quot; &apos;
  • <!-- this is a comment -->
  • <? this is a processing instruction ?>
  • default namespace is declared as xmlns="name"

Document Type Definition (DTD)[Bearbeiten | Quelltext bearbeiten]

A DTD lists all the elements and attributes the document uses. The order of declarations is not significant.

<!ELEMENT person (name, tel, fax, email+)>
<!ELEMENT name (#PCDATA)>
<!ATTLIST person id_number ID #REQUIRED>

If a document matches the schema, it is valid, otherwise, it is invalid. Validation errors may be ignored by applications.

Document Type Declaration:

<!DOCTYPE person SYSTEM "http://www.example.com/dtds/person.dtd">

The location can also be relative.

XML Schema Definition (XSD)[Bearbeiten | Quelltext bearbeiten]

simple elements
Contain only text. We can add restrictions.

Built-in types:

  • xsd:boolean, xsd:string, xsd:decimal, xsd:integer, xsd:date, xsd:time, etc.
Restrictions on Values
  • xsd:minInclusive
  • xsd:maxInclusive
  • xsd:minExclusive
  • xsd:maxExclusive
  • xsd:enumeration
  • xsd:pattern (regex)
  • xsd:whiteSpace
  • xsd:length
  • xsd:minLength
  • xsd:maxLength
Complex elements
Contain other elements and/or attributes.

Order Indicators

  • all
  • choice
  • sequence

Occurence Indicators

  • minOccurs
  • maxOCcurs

Keys and References

XPath[Bearbeiten | Quelltext bearbeiten]

13 axes

Extensible Stylesheet Language Transformations (XSLT)[Bearbeiten | Quelltext bearbeiten]

Templates match the input document, and define the output.

Templates for the subtree are only called with

<xsl:apply-templates/>

XSLT has the following default templates:

  • for root and elements: apply templates for child elements
  • for text elements: copy content to output
  • for attributes: copy value to output

Exactly one template is executed, more specific XPaths are prioritized.

XQuery[Bearbeiten | Quelltext bearbeiten]

FLWOR
for ... let ... where ... order by ... return ...

Parsing[Bearbeiten | Quelltext bearbeiten]

  • event-based: SAX (Simple API for XML)
  • tree-based: DOM (Document Object Model)