TU Wien:Experiment Design for Data Science VU (Knees)

From VoWi
Jump to navigation Jump to search
Similarly named LVAs (Resources):

Daten[edit]

Lecturers Peter KneesAlexander SchindlerAllan HanburyAndreas Rauber
ECTS 3
Alias Experiment Design for Data Science (en)
Department Information Systems Engineering
When winter semester
Last iteration 2020WS
Language English
Links tiss:188992
Zuordnungen
Master Data Science Wahlmodul FDS/FD - Fundamentals of Data Science - Foundations
Master Data Science Wahlmodul Freie Wahlfächer
Masterstudium Computational Science and Engineering Wahlmodul Data Management
Master Business Informatics Wahlmodul DA/EXT - Data Analytics Extension
Katalog Freie Wahlfächer - Wirtschaftsinformatik Wahlmodul Freie Wahlfächer
Katalog Freie Wahlfächer - Informatik Wahlmodul Freie Wahlfächer

Mattermost: Channel "experiment-design-for-data-science"RegisterMattermost-Infos

Inhalt[edit]

Data privacy and ethics (Hanbury), statistical testing (Knees), and productivity and data management (Rauber). Esentially, one of data science "bread and butter" courses.

Ablauf[edit]

Different lectureres covering different parts of the material. Lectures not every week. Attendance not mandatory (not checked). 2 practical assignments to be done during semester. The first assignment is individual and requires esentially no coding except for very basic dataset exploration and visualisation. The second assignment a group work and is far more demanding in terms of coding and time needed.

Benötigte/Empfehlenswerte Vorkenntnisse[edit]

Basic knowledge about statistical testing is certainly helpful, as is basic knowledge about how to set up a machine learning experiment (but this is also briefly presented in the lectures). For reproducing the chosen paper's experiment, some programming skills come in handy. Here, some knowledge of python (scikit-learn, pandas, numpy; visulization with matplotlib, seaborn can be helpful) is necessary, maybe some R and/or Python if you prefer that (may depend on the chosen paper to be reproduced, see assignments). Weka could also be needed.

Vortrag[edit]

  • A. Hanbury: clear and concise, on-topic, presentation of relevant thoughts and questions surrounding Data Science
  • P. Knees: lectures about experiment design, forming hypotheses and statistical evaluation.
  • A. Rauber: lectures about reproducibility of papers and experiments, problems with code, libraries, etc.

Übungen[edit]

2 Assignments:

  • Forming hypotheses for (machine learning) experiments: given a data set, explore it and describe some interesting details. Then, from the gained insights, form three different hypotheses that can be tested in machine learning experiment. Describe dependent and independent variables, and how you would conduct the respective experiments. No actual programming required. To be done individually (not group work).
  • From a list of 3 short papers (around 3-4 pages), select one and reproduce the experiments and results. Check how well that is possible, if all data, description of methods, etc., is available to be able to reproduce. This is to be done in groups of 3 students. Groups are giving a short presentation on their progress during the semester, helpful for getting feedback. Deliverables are code and a report.

-> in 2020W: Professor Knees had decided that he would ask the students to reproduce the experiments on papers that students pick for themselves. He offers list of possible links one can look through and decide as a group which short paper to repoduce.

Prüfung, Benotung[edit]

The exam contains 4 relatively open-ended questions, each bringing 25% of the points (see materials).

Dauer der Zeugnisausstellung[edit]

noch offen

Zeitaufwand[edit]

noch offen

Unterlagen[edit]

2019WS: To simplify studying for the exam I created a minimal set of slides containing just the "relevant" parts: Minimal_2019_all_blocks.pdf

Tipps[edit]

  • Assignment 2: Form a group of motivated people, start early, check for your favorite paper option (which one seems easy / aligned with your skills), read some extra literature around the chosen paper, if necessary.

I had no prior knowledge on statistical testing but I still managed to get a 2. For assignment 1, it really helps if you know Python or excel or any other prgrams/programming language to analyze data sets. For assignment 2, I completely agree with the person who wrote above me but would like to mention that before picking the paper, check the datasets because sometimes your computer might not be enough to validate the experiment due to the datasets' massive size.

Verbesserungsvorschläge / Kritik[edit]

noch offen