TU Wien:Business Intelligence VU (Tjoa)/Exam 2021-01-25

Aus VoWi
Zur Navigation springen Zur Suche springen

The exam consisted of two parts. The exam was conducted in TUWEL.

To attend the exam, you had to join a Zoom meeting with two devices, one filming from the front and one from the side/back. Students were let into the meeting one at a time. Between the parts, there was a short break (bathroom allowed).

The exact grading scheme for the exam and scaling were unknown at the time of taking the exam. Results were expected at the end of January/beginning of February.

Part 1: SC (True-False)[Bearbeiten | Quelltext bearbeiten]

20 true/false questions from a question pool ranging over all slides. Work time 20 minutes. Correct answer: +2 points, False answer: -1 point, No answer: 0 points.

  • K-means is extremely robust against outliers in the data
  • Carefully pruned decision trees usually show higher precision on the training data than un-pruned decision trees
  • Lazy Learning is not recommended when there is high drift in data space, leading to changing decision boundaries
  • The knn classifier using Euclidean distance is computationally more expensive at the model building stage than a Decision Tree using simple error counts as splitting criterion.
  • Ordinal data allows distances to be computed between data points
  • Random sampling of time series data for classifier training may lead to an overestimation of model performance
  • 1-to-N coding (one-hot encoding) reduces the dimensionality of the feature space
  • CRISP-DM: Business Success Criteria are ideally specified as subjective measures and Data Mining Success Criteria should be specified as objective measures
  • Zero-mean unit variance normalization is highly sensitive to outliers in the data
  • Lazy learning is more time-efficient at classification stage
  • According to the Data Warehousing Institute (TDWI) working definition, Business Intelligence encompasses analytic tools
  • In Hadoop, applications are typically written in high-level code such as Java
  • In Hadoop processing is coordinated through MapReduce.
  • In context of the DWH reference architecture, the Metadata Component stores operational metadata, extraction and transformation metadata, and end-user metadata.
  • The Staging Area in the DWH reference architecture is a database that stores a single data extract of a source database.
  • [DWH] In a typical Lamba architecture, queries can be answered by merging results form a batch and real-time views.
  • Data silos hold data for individual sets for applications or organization units.
  • Big advantages of a snowflake schema include that the schema becomes more intuitive and browsing through the content is easy
  • In DWH, the concept "warehouse" supports bi-directional data flows between related data sources.
  • In context of DWH analytics, predictive analytics focus on investigating past effects to capture relevant information.

Part 2: Open Questions[Bearbeiten | Quelltext bearbeiten]

Work time: 25 minutes.

The questions were selected from a question pool. Two questions each from the Data Warehousing/Data Mining part, meaning 4 questions in total. Average time to answer: 6.25 minutes, which is extremely short even for a TU exam.

Example questions (analogously):

Data Warehousing[Bearbeiten | Quelltext bearbeiten]

  • Name three requirements of peration systems using OLTP and discribe each of them briefly
  • Discribe the 3 Steps of the MapReduce process flow

Data Mining[Bearbeiten | Quelltext bearbeiten]

  • Name 4 types of attributes used in data mining, make an exampe, discribe the characteristics and the allowed mathematical oerpations
  • Describe Single Linkage and Complete Linkage, how do the algorithms work and what are the characteristics