TU Wien:Datenorientierte Programmierparadigmen VU (Hanbury)

Daten[Bearbeiten | Quelltext bearbeiten]

Vortragende	Mohammad Mahdi Azarbeik• Allan Hanbury• Filip Kovacevic• Ilya Lasy• Florina Piroi• Gábor Recski• Moritz Staudinger
ECTS	3,0
Alias	Data-oriented Programming Paradigms (en)
Letzte Abhaltung	2024W
Sprache	English
Mattermost	datenorientierte-programmierparadigmen • Register • Mattermost-Infos
Links	tiss:188995, eLearning

Zuordnungen
Erweiterungsstudium Digitale Kompetenzen	Modul Informatik (Gebundenes Wahlfach)
Masterstudium Data Science	Modul FDS/FD - Fundamentals of Data Science - Foundations (Pflichtfach)
Masterstudium Business Informatics	Modul DA/COR - Data Analytics Core (Gebundenes Wahlfach)
Katalog Freie Wahlfächer
Katalog Freie Wahlfächer

Inhalt[Bearbeiten | Quelltext bearbeiten]

The first part of the course is DOPP in a pure sense. The second and third part are exercises in data science, once on a given topic and once on a selected topic where all the steps of the data science process have to be done (ask a question, get the data...)

Ablauf[Bearbeiten | Quelltext bearbeiten]

Some lectures, 3 assignments to be solved.

Benötigte/Empfehlenswerte Vorkenntnisse[Bearbeiten | Quelltext bearbeiten]

Some programming knowledge, especially python, is certainly necessary. Data visualization and some statistical knowledge regarding basic testing, correlation, regression models. Little bit about how to handle data (e.g. pandas) and basics about pre-processing, cleaning data. (These things are also explained in the lectures, to some extent).

Vortrag[Bearbeiten | Quelltext bearbeiten]

A lecture on each topic

Data Science process (Hanbury)
Python intro (Böck)
Numpy intro (Böck)
Pandas intro (Piroi)
Machine Learning intro/scikit-learn (Hanbury)
Network Analysis (Hanbury)

Note that the lectures are only an introduction to the topics. The tasks in the exercises are more complicated than the presented examples.

Übungen[Bearbeiten | Quelltext bearbeiten]

1st assignment (individual): compare diffrent programming paradigms (OOP, Data-Oriented). To be done in using jupyter-notebook on infrastructure of TU (on the web).
2nd assignment (individual): loading data, pre-processing (filling missign values, outlier detection and removal), aggregating and merging data, analyzing data (correlations, etc.), building a model for prediction. Also in jupyter, as above.
3rd assignment (groups of 4): larger project; select from list of topics, collect, organize, process your own data to answer the given problem. Written report and final presentation.

WS2020:

1st assignment (individual): Read, manipulate and plot some data without using Pandas (We were allowed to use xlrd). Not very difficult, but I don't know what the point of this exercise was since Pandas is an absolute staple in many data science related tasks, is heavily used in Exercise 2 and we didn't have to do anything that could not also be done with less effort with Pandas. Also some hidden tests that checked for things that were only vaguely mentioned in the task description
2nd & 3rd assignment as above

WS2021:

1st assignment (individual): text processing on the command line. We had to choose a version of Alice in Wonderland in a different language than English and extract all of the character's name (best effor, it didn't have to be perfect) from the text by using Unix commands such as grep, sed, etc.
2nd & 3rd assignment as above

Prüfung, Benotung[Bearbeiten | Quelltext bearbeiten]

There is no exam, only the 3 assignments. Points earned for these determine your grade. Grading of the exercises takes a couple of weeks.

Grading in general is very intransparent. Points in JupyterHub change multiple times within a few days and you don‘t know how many points you really got. Also, the points on TUWEL don‘t match the points on JupyterHub.

Dauer der Zeugnisausstellung[Bearbeiten | Quelltext bearbeiten]

Semester	Letzte Leistung	Zeugnis
2019W	27.01.2020	total points 24.02.2020, certificate a few days after that
2021W	25.01.2022 (last presentation day of exercise 3)	20.02.2022

Zeitaufwand[Bearbeiten | Quelltext bearbeiten]

If you have no prior experience with Pandas or Machine Learning your effort will be a lot higher than 3 ECTS.

Unterlagen[Bearbeiten | Quelltext bearbeiten]

noch offen

Tipps[Bearbeiten | Quelltext bearbeiten]

Final presentation: there is a strict 10 minutes time limit that is strictly enforced. Make sure that you can fit your presentation into this time slot, less is more here.
Start early with the final project (exercise 3), it can be quite some work.
In WS19 there were a lot of organizational issues with the exercises. Non working jupyter hub, and multiple errors in the exercise data. Also, check the descriptions carefully for exercise 1 in order not to lose points in the hidden tests (e.g. a function that is never used in the visible tests - in WS19 it was __len__ of an object). So, start early with the exercises, and if you notice an issue write it immediately to tuwel with sufficient proof why there is an issue.
In WS21, there was an unfortunate incident where a couple of students lost their notebooks as well as all backups of their second assignment due to an error with JupyterHub. Always make sure to make regular backups of your work, especially if you do it online! Note: It should be emphasized that none of the affected students had to redo the exercise. Their grade is determined using the points of exercise 1 and exercise 3.

Highlights / Lob[Bearbeiten | Quelltext bearbeiten]