TU Wien:Datenorientierte Programmierparadigmen VU (Hanbury)

Aus VoWi
Zur Navigation springen Zur Suche springen

Daten[Bearbeiten | Quelltext bearbeiten]

Vortragende Allan HanburyFlorina PiroiGábor RecskiMoritz StaudingerRichard Vogl
ECTS 3,0
Alias Data-oriented Programming Paradigms (en)
Letzte Abhaltung 2023W
Sprache English
Mattermost datenorientierte-programmierparadigmenRegisterMattermost-Infos
Links tiss:188995
Zuordnungen
Masterstudium Data Science Modul FDS/FD - Fundamentals of Data Science - Foundations (Pflichtfach)
Masterstudium Business Informatics Modul DA/COR - Data Analytics Core (Gebundenes Wahlfach)
Masterstudium Computational Science and Engineering Modul Untergruppe Data Management and Analytics
Katalog Freie Wahlfächer
Katalog Freie Wahlfächer


Inhalt[Bearbeiten | Quelltext bearbeiten]

The first part of the course is DOPP in a pure sense. The second and third part are exercises in data science, once on a given topic and once on a selected topic where all the steps of the data science process have to be done (ask a question, get the data...)

Ablauf[Bearbeiten | Quelltext bearbeiten]

Some lectures, 3 assignments to be solved.

Benötigte/Empfehlenswerte Vorkenntnisse[Bearbeiten | Quelltext bearbeiten]

Some programming knowledge, especially python, is certainly necessary. Data visualization and some statistical knowledge regarding basic testing, correlation, regression models. Little bit about how to handle data (e.g. pandas) and basics about pre-processing, cleaning data. (These things are also explained in the lectures, to some extent).

Vortrag[Bearbeiten | Quelltext bearbeiten]

A lecture on each topic

  • Data Science process (Hanbury)
  • Python intro (Böck)
  • Numpy intro (Böck)
  • Pandas intro (Piroi)
  • Machine Learning intro/scikit-learn (Hanbury)
  • Network Analysis (Hanbury)

Note that the lectures are only an introduction to the topics. The tasks in the exercises are more complicated than the presented examples.

Übungen[Bearbeiten | Quelltext bearbeiten]

  • 1st assignment (individual): compare diffrent programming paradigms (OOP, Data-Oriented). To be done in using jupyter-notebook on infrastructure of TU (on the web).
  • 2nd assignment (individual): loading data, pre-processing (filling missign values, outlier detection and removal), aggregating and merging data, analyzing data (correlations, etc.), building a model for prediction. Also in jupyter, as above.
  • 3rd assignment (groups of 4): larger project; select from list of topics, collect, organize, process your own data to answer the given problem. Written report and final presentation.

WS2020:

  • 1st assignment (individual): Read, manipulate and plot some data without using Pandas (We were allowed to use xlrd). Not very difficult, but I don't know what the point of this exercise was since Pandas is an absolute staple in many data science related tasks, is heavily used in Exercise 2 and we didn't have to do anything that could not also be done with less effort with Pandas. Also some hidden tests that checked for things that were only vaguely mentioned in the task description
  • 2nd & 3rd assignment as above

WS2021:

  • 1st assignment (individual): text processing on the command line. We had to choose a version of Alice in Wonderland in a different language than English and extract all of the character's name (best effor, it didn't have to be perfect) from the text by using Unix commands such as grep, sed, etc.
  • 2nd & 3rd assignment as above

Prüfung, Benotung[Bearbeiten | Quelltext bearbeiten]

There is no exam, only the 3 assignments. Points earned for these determine your grade. Grading of the exercises takes a couple of weeks.

Dauer der Zeugnisausstellung[Bearbeiten | Quelltext bearbeiten]

Semester Letzte Leistung Zeugnis
2019W 27.01.2020 total points 24.02.2020, certificate a few days after that
2021W 25.01.2022 (last presentation day of exercise 3) 20.02.2022


Zeitaufwand[Bearbeiten | Quelltext bearbeiten]

If you have no prior experience with Pandas or Machine Learning your effort will be a lot higher than 3 ECTS.

Unterlagen[Bearbeiten | Quelltext bearbeiten]

noch offen

Tipps[Bearbeiten | Quelltext bearbeiten]

  • Final presentation: there is a strict 10 minutes time limit that is strictly enforced. Make sure that you can fit your presentation into this time slot, less is more here.
  • Start early with the final project (exercise 3), it can be quite some work.
  • In WS19 there were a lot of organizational issues with the exercises. Non working jupyter hub, and multiple errors in the exercise data. Also, check the descriptions carefully for exercise 1 in order not to lose points in the hidden tests (e.g. a function that is never used in the visible tests - in WS19 it was __len__ of an object). So, start early with the exercises, and if you notice an issue write it immediately to tuwel with sufficient proof why there is an issue.
  • In WS21, there was an unfortunate incident where a couple of students lost their notebooks as well as all backups of their second assignment due to an error with JupyterHub. Always make sure to make regular backups of your work, especially if you do it online! Note: It should be emphasized that none of the affected students had to redo the exercise. Their grade is determined using the points of exercise 1 and exercise 3.

Highlights / Lob[Bearbeiten | Quelltext bearbeiten]

noch offen

Verbesserungsvorschläge / Kritik[Bearbeiten | Quelltext bearbeiten]

noch offen

Materialien

Diese Seite hat noch keine Anhänge, du kannst aber neue hinzufügen.