TU Wien:Experiment Design for Data Science VU (Knees)/Exam 23.01.2019 - Group C
Zur Navigation springen
Zur Suche springen
23.01.2020 - Group C - 60 minutes
- Q1. Algorithms
- Most prominent advantages of automated algorithms making decision, and why?
- Most prominent disadvantages of automated algorithms making decision, and why?
- What are challenges in making decisions of algorithms transparent?
- What are disadvantages of making algorithms decisions transparent?
- Q2. Data Citation
- What are the two challenges with data citation and list the (unsuccessful) approaches to overcome them.
- Describe the approach of the RDA Data Citation WG to resolve this issue.
- Q3. Statistical testing
- Given an experimental setup with 10.000 instances, 20-fold cross validation (k-fold with k=20) and accuracy as performance measurement.
- List two approaches for statistical significance testing
- What's the sample size, e.g. what's N in this case.
- What errors can be made in statistical hypothesis testing? Explain them briefly. And how to reduce the possibility to make them?
- Given an experimental setup with 10.000 instances, 20-fold cross validation (k-fold with k=20) and accuracy as performance measurement.
- Q4. Experimental Setup
- Given a social media classification system, 1 mio posts which include an unique id, some text and a timestamp. You have two experimental setups, once with time-based split and one with cross-validation split.
- Describe both, e.g. with a sketch.
- Which approach would you suggest? What are the advantages and disadvantages of both?
- Given a social media classification system, 1 mio posts which include an unique id, some text and a timestamp. You have two experimental setups, once with time-based split and one with cross-validation split.