TU Wien:Data Stewardship VO (Rauber)

From VoWi
Jump to navigation Jump to search
Similarly named LVAs (Resources):

Daten[edit]

Lecturers Andreas Rauber, Tomasz Miksa
ECTS 3
replaces Digital Preservation VO (Rauber)
Department Information Systems Engineering
When summer semester
Language English
Links tiss:194044 , Mattermost-Channel
Zuordnungen
Master Data Science Pflichtmodul FDS/CO - Fundamentals of Data Science - Core
Master Business Informatics Wahlmodul DA/EXT - Data Analytics Extension
Master Medizinische Informatik Wahlmodul Informationsverarbeitung
Master Software Engineering & Internet Computing Wahlmodul Advanced Security

Mattermost: Channel "data-stewardship"RegisterMattermost-Infos

Inhalt[edit]

noch offen, bitte nicht von TISS oder Homepage kopieren, sondern aus Studierendensicht beschreiben.

Ablauf[edit]

noch offen

Benötigte/Empfehlenswerte Vorkenntnisse[edit]

It helps to know a bit about ontologies and linked data, semantic web, metadata standards. Basic understanding of experiment design and related processes, with respect to reproducibility. Classes that may be a bit helpful (and possibly even cover some of the material presented here) are "Experiment Design for Data Science", and to a lesser degree "Introduction to Semantic Systems".

Vortrag[edit]

noch offen

Übungen[edit]

noch offen

Prüfung, Benotung[edit]

SS20 - Exam 30.09.2020:

  • multiple choice questions, only 1 open question
  • exam was apparently copied/randomly assigned from moodle or the like - i got asked 6-7 times the same multiple choice question. sometimes answers true/false were mixed or true/true or false/false were answer possibilities. overall, very unclear what to tick and what not. :/
  • Other student: SS20 - I agree one of the most unfriendliest exams I ever took in 9 semesters at TU Wien. Sometimes you could really argue a question to be true and false at the same time because there was not enough context given.
  • Other student: I did really prepare well for this exam, but I was lost. I confirm the above. It was really not clear if some of the answers were true or false. Everything could be argued and interpreted in the one or other direction. And as you said, the context was not given. Just some random sentence. AND: if you did not tick the right combination of answers, you lose all the points on most questions.

SS20 - Exam 15.12.2020:

  • oral exam online (Covid restrictions). 2-3 candidates at once, at least 2 rounds of questions. Questions or parts of questions were somtimes handed over to other candidates if enough or too little info was given. Grading seemed fair to me.

Dauer der Zeugnisausstellung[edit]

(OUTDATED (format of exam etc. has changed): I was the only one who took the exam on 16.12.19, got the grade the same day in the evening)

Zeitaufwand[edit]

Lots of material. Aiming for top grade may require going through the whole material 2 times + some extra material (see below). I used a month for ingesting stuff in smaller portions.

Unterlagen[edit]

noch offen

Tipps[edit]

SS20: Do NOT take this course!!! The new exam design is horrible and unfair, even if you prepare well for the exam it is hard to pass the course. A good grade is impossible, if you don't have a lot prior knowledge in this field and even then some single choice questions are just an impudence.


[Experience from oral exam Dec. 2020, no written exams due to Corona]:

For a grade in the top range, going through the slides 1-2 times completely was necessary for me, and additionally reading some of the existing summaries and the few documented exam questions, to get an idea what kind of questions are asked.

Attendance at the lecture helped in so far as to know which topics and parts of the slides are more important, which are less. The slides are quite verbose and there's so much material there.

This means one should start quite early with peparations. I stretched all the reading out over about a month, that made it easier to ingest the material in smaller pieces.

For some concepts, I resorted to skimming some of the papers that they cite in the slides; this may not be really necessary, but helped me understand some of the material better. Imho worth a look are (find the sections that seem important, not necessarily read the whole paper) (this is certainly not an exhaustive list, you should check on your own...):

  • Miksa et al (2019) - Ten principles for machine-actionable DMP
  • Miksa et al (2018) - Defining requirements for maDMP

Interesting was also this one:

  • Bajpai et al (2019) - The Dagstuhl Beginners Guide to the Reproducibility for Experimental Networking Research

A common theme that runs through some of the material is "machine-actionability". My understanding is that this is mostly achieved by using controlled vocabularies and (possibly community specific) metadata standards, as well as persistant identifiers and possibly typical communication protocols (like OAI-MPH). I'm not sure I missed the place where this is clearly defined, but it only dawned on my at some point that this is what this means. [and, as always, correct me if I'm wrong]

The slides are full of acronyms. A collection was started here https://vowi.fsinf.at/wiki/TU_Wien:Data_Stewardship_VO_(Rauber)/Questions_from_Slides (far from complete), but note: most of them don't seem to be so important. The top contenders of acronyms and the things behind them that one should know (I think):

  • FAIR (!)
  • DMP
  • maDMP
  • DC (Dublin Core)
  • DCAT
  • PID
  • DOI
  • ARK
  • ORCID
  • OAIS
  • OAI-PMH
  • SIP
  • AIP
  • DIP
  • PDI
  • RDF

This is all my (subjective) opinion, your milage may very.


Verbesserungsvorschläge / Kritik[edit]

The slides are partially very verbose, in other places not very detailed. Slides alone are not enough for some topics. That is why I don't understand why there were no recordings of the lectures produced. I think that would be a tremendous help (yes, yes, there are some concerns about creating videos; imho they are all outweighed by the benefits).

A more concrete official collection of possible exam questions would probably help a lot with peparations for the exam, to get more of what is the focus among the huge amount of topics and material - there are some questions in the slide sets, but not in all of them.

This is clearly a quite dynamic field, with new stuff being added to the lecture often, it seems, based on practical experience of the lecturers. I guess this makes it not so easy to settle on more concrete forms of material; again, showing focus by providing some questions for each topic could be helpful.