TU Wien:Distributed Systems Technologies VU (Truong)/Prüfung 2021-06-30

Aus VoWi
Zur Navigation springen Zur Suche springen

Part 1[Bearbeiten | Quelltext bearbeiten]

  1. Suppose you need to store a large data set durably and across several cluster nodes. Furthermore, most queries involve aggregation queries (e.g., calculating the sum or average of a specific data attribute). Among the following database technologies, which is the most appropriate for this scenario?
    1. InfluxDB
    2. Apache Cassandra
    3. Redis
    4. Neo4J
  2. Which data source architectural pattern is reflected in the code example below?
  User user = new User();
  user.setName("myUser");
  user.save(); // writes to a database
    1. Active Record
    2. Data Mapper
    3. Row Data Gateway
    4. Table Data Gateway
  • Explain the term leaky abstraction. Give an example of a leaky abstraction and explain what makes it leaky.
  • Map-Reduce is a useful programming model for implementing queries over shared or partitioned data. Explain why and give an illustrative example.
  • Given the following Map--Reduce functions and input data, write down the individual steps and final results when applying the Map--Reduce query. Describe what is the role and the output of every function, from the intermediate states to the end. INPUT: A collection of structured textual files containing the hourly value of humidity and temperature from a set of sensors located in different areas..
  • Explain the two dimensions of horizontal data scaling, and give for each a concrete example.
  • Discuss the properties of layering for applications, and describe three use-cases with different layer distribution.

Part 2[Bearbeiten | Quelltext bearbeiten]

  1. Which statement(s) about the marshaller in remoting architecture are correct?
    1. The marshaller takes care of transporting messages between client and server
    2. The marshalling data format depends on the client's platform type
    3. The marshaller converts remote invocations into messages for the skeleton
    4. On the client side, the skeleton takes the role of the marshaller
  2. Which statement is FALSE about REST services?
    1. It is a protocol that guarantees a uniform way of interacting with a given server
    2. It allows you to use a layered system architecture
    3. In REST the resources are accessible via Unique Resource Identifiers
    4. It is an architecture that leverages the HTTP Protocol as transport layer
  • Describe the two main approaches applicable for a message consumer, defining advantages and disadvantages.
  • In the context of Microservices, explain the term "Chatty Interface" and what problems such an interface causes.
  • Explain the relationship between Polyglot Persistence and Microservice architecture.
  • Describe the key characteristics of the HTTP Interface for RESTful services, and list the five (sic!) main verbs.

Part 3[Bearbeiten | Quelltext bearbeiten]

  1. In the OAuth Authorization Code Grant Flow, an Access Token is
    1. used by the Authorization Server to authorize a User
    2. used by the Resource Server and Authorization Server to verify a Client
    3. used by the Client to receive an Access Token from the Authorization Server
    4. used by the Client to access a restricted resource on a User's behalf
  2. In Aspect-Oriented Programming, an Advice is
    1. an identifiable point in the program execution
    2. a piece of code to be executed at a specific point in the program execution
    3. a modularized implementation of a crosscutting concern
    4. an expression that matches specific points in the program execution
  • Briefly explain the term "distributed transaction". Give an example of when it is necessary, and how it can be implemented.
  • Briefly explain the two-phase commit (2PC) protocol and what it is used for.
  • Briefly explain how you could use Annotations, Dynamic Proxies, and Dependency Injection to implement a container-managed database transaction mechanisms.
  • Explain the principle of delegated authorization in distributed systems and why it is a fundamental part of modern web applications.

Part 4[Bearbeiten | Quelltext bearbeiten]

  1. Which statements about different scheduler architectures are correct?
    1. two-level schedulers allow application-specific scheduling strategies
    2. shared-state schedulers use pessimistic concurrency control for cluster state information
    3. monolithic schedulers do not require concurrency control for cluster state information
    4. the Kubernetes scheduler is an example of a two-level scheduler
  2. Which statements regarding Kubernetes are true?
    1. The Kubernetes master can be seen as a Virtualized Infrastructure Manager
    2. The Kubernetes master can be seen as an hypervisor
    3. Etcd supports service discovery
    4. The ingress controller is internally defined by kubernetes, with a well-specific set of functionality
  • Describe the hardware-based virtualization, what are the possible configurations? What are their differences? Give an example for each configuration.
  • Explain the role of virtualization in elasticity of distributed systems.
  • Describe the three types of autoscaling, and provide an example for every one of them.
  • Explain the role of the virtual infrastructure manager in cloud computing platforms.

Part 5[Bearbeiten | Quelltext bearbeiten]

  1. Which statement(s) about watermarking are true?
    1. It is a method for marking partitions in window aggregation
    2. It can be emitted in a punctuated or periodic fashion
    3. It guarantees the correct ordering of event
    4. It is a methodology for dealing with the unordered arrival of events
    5. Watermarks can be emitted only in conjunction with a particular event
  2. Which one(s) of the following are black box metrics
    1. CPU utilization
    2. Network traffic
    3. Number of logged in users
    4. Number of queries
  • In the context of event-based architecture, what is "event-carried state transfer"? Name two consequences (e.g., benefits or drawbacks) of using the pattern.
  • How does Apache Kafka enable reliability in distributed stream processing systems? (hint: think about a) operator parallelization/distribution and what happens when operator nodes fail, and b) operators that struggle with load peaks.
  • What is the idea behind complex event processing? Make an example with Flink.