TU Wien:Experiment Design for Data Science VU (Knees)/Exam 23.01.2019 - Group C

23.01.2020 - Group C - 60 minutes

Q1. Algorithms
- Most prominent advantages of automated algorithms making decision, and why?
- Most prominent disadvantages of automated algorithms making decision, and why?
- What are challenges in making decisions of algorithms transparent?
- What are disadvantages of making algorithms decisions transparent?
Q2. Data Citation
- What are the two challenges with data citation and list the (unsuccessful) approaches to overcome them.
- Describe the approach of the RDA Data Citation WG to resolve this issue.
Q3. Statistical testing
- Given an experimental setup with 10.000 instances, 20-fold cross validation (k-fold with k=20) and accuracy as performance measurement.
  - List two approaches for statistical significance testing
  - What's the sample size, e.g. what's N in this case.
- What errors can be made in statistical hypothesis testing? Explain them briefly. And how to reduce the possibility to make them?
Q4. Experimental Setup
- Given a social media classification system, 1 mio posts which include an unique id, some text and a timestamp. You have two experimental setups, once with time-based split and one with cross-validation split.
  - Describe both, e.g. with a sketch.
  - Which approach would you suggest? What are the advantages and disadvantages of both?

Navigationsmenü