TU Wien:Experiment Design for Data Science VU (Knees)/Exam 23.01.2019 - Group C

From VoWi
< TU Wien:Experiment Design for Data Science VU (Knees)
Revision as of 20:02, 23 January 2020 by Ttravnicek (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

23.01.2020 - Group C - 60 minutes

  • Q1. Algorithms
    • Most prominent advantages of automated algorithms making decision, and why?
    • Most prominent disadvantages of automated algorithms making decision, and why?
    • What are challenges in making decisions of algorithms transparent?
    • What are disadvantages of making algorithms decisions transparent?
  • Q2. Data Citation
    • What are the two challenges with data citation and list the (unsuccessful) approaches to overcome them.
    • Describe the approach of the RDA Data Citation WG to resolve this issue.
  • Q3. Statistical testing
    • Given an experimental setup with 10.000 instances, 20-fold cross validation (k-fold with k=20) and accuracy as performance measurement.
      • List two approaches for statistical significance testing
      • What's the sample size, e.g. what's N in this case.
    • What errors can be made in statistical hypothesis testing? Explain them briefly. And how to reduce the possibility to make them?
  • Q4. Experimental Setup
    • Given a social media classification system, 1 mio posts which include an unique id, some text and a timestamp. You have two experimental setups, once with time-based split and one with cross-validation split.
      • Describe both, e.g. with a sketch.
      • Which approach would you suggest? What are the advantages and disadvantages of both?