Assignment 1 ############ .. rubric:: Application Overview Throughout this lab, you will learn many different types of technologies and how to use them to solve problems in a complex distributed enterprise application. The tasks are (loosely) tied together under a common scenario: a ridesharing or `Mobility Service Provider (MSP)`_ platform (like Uber or Lyft). Our system manages organizations, drivers, vehicles, riders, and trips; provides a matchmaking system for drivers and riders; allows riders to create multi-stop trips; manages the transactional billing process for trips; and provides a rating system for drivers and riders. Note that we will not implement a fully functional end-to-end application, but rather cherry-pick specific modules of such an application. In this first assignment, we will focus on the domain model and persistence layer of the system. In later stages of the lab, you will solve business problems, implement framework features, and tackle real-time analytics and infrastructure scalability issues. .. _Mobility Service Provider (MSP): https://en.wikipedia.org/wiki/Transportation_network_company .. rubric:: Assignment 1 In this assignment you will learn how to work with an Object-Relational Mapping (ORM) framework. You will map a given object model to a relational data model using the JavaEE `Java Persistence API (JPA)`_, and use JPA to write queries and transactions. You will use `jOOQ`_, to implement additional entities and queries. You will also learn about non-traditional data models such as the document-oriented model using `MongoDB`_, and key-value stores using `Redis`_. .. _Java Persistence API (JPA): https://javaee.github.io/tutorial/persistence-intro.html .. _MongoDB: https://www.mongodb.com .. _Redis: https://Redis.io .. _jOOQ: https://www.jooq.org/ .. rubric:: Points ====================================== ======= Task Points ====================================== ======= Object-Relational Mapping 15 Querying Entities 4 jOOQ 8 NoSQL: Document Databases 8 NoSQL: NoSQL: Key-Value Stores 7 Theory Questions 8 ====================================== ======= .. _Object-Relational Mapping: Object-Relational Mapping ============================== In this task, you will create the persistence layer of our ridesharing platform. In particular, you will use JPA to define the object-relational mapping and write queries on the data model. Because JPA is only a service provider interface (SPI), we need a service provider which is what we use `Hibernate `_ for. :numref:`fig-ass1-model` shows the domain model of the system as a UML class diagram. .. _fig-ass1-model: .. figure:: img/ass1-model.svg :align: center The domain model of our ridesharing platform In the template, interfaces (we use the Hungarian notation where interfaces are prefixed with ``I``) are provided for all model classes and `data access objects (DAO) `_. Create an implementation class for each of the interfaces and make sure to correctly instantiate objects using the ``ModelFactory`` and the ``DAOFactory``. Put all the implementation classes you add into separate impl sub-packages, i.e., ``dst.ass1.jpa.model.impl`` and ``dst.ass1.jpa.dao.impl``. To test/demonstrate your mapping and implementations the template provides unit tests that reads the Hibernate configuration, creates a persistence context, and inserts a test fixture into the database. You may extend the pre-defined tests and also include additional test classes to cover any special situations and corner cases, if necessary. Make sure that your solution works in its entirety, that is, no unnecessary tables are created, no unintended information is lost when deleting entities (e.g., because of unexpected cascades), and that there are no exceptions thrown when performing any of the tasks. .. hint:: This part of the solution is required for other tasks in later assignments, so we strongly recommend that you solve *at least* this part of the assignment. Additionally, have a look at the provided ``dst.ass1.jpa.util.Constants`` class, which defines various constants that should be considered for implementing the described domain model. However, note that in *most* cases the constants simply explicate the default values generated by JPA. To get started, we suggest you read the `Oracle JPA tutorial `_. You don't have to worry about the configuration and bootstrapping of Hibernate, as this is all pre-configured in the test template, so you can focus on the core functionalities of JPA. However, if you are using Windows, you may need to change the target path of the temporary H2 database instance, which is configured in ``persistence.xml`` in the ``connection.url`` field. Changing it to a relative path ``./tmp/database/dst``, for example, will work. Basic Mapping -------------------- JPA provides two ways of declaring object-relational mappings: Java annotations in the code, and XML-based configuration files. You should map the model classes and all associations using the ``javax.persistence.*`` annotations in your respective model implementations. For the ``Trip`` entity, you should define the mapping using the XML-based configuration (this has no particular practical purpose, but it allows you to play with both methods. Many frameworks provide annotation and XML based configurations, both of which have advantages and disadvantages). Define the XML mapping in the ``Trip.xml`` file in the resources directory of the ``ass1-jpa`` module. .. rubric:: Mapping details * Make sure that you implement the navigation in the data model as specified in the diagram (for instance, the ``Driver``--``Organization`` association is bidirectional, so the entities should reflect this). You should be able to explain how navigation affects the underlying relational model (e.g., foreign keys and relation tables). * The relationship between a Driver and an Organization is unique. Employments are identified via the drivers's and the organization's IDs, and should make up a composite key (``IEmploymentKey``). The employment association object also stores whether the driver is currently active in this organization. * Geospatial data and details of Locations are stored in the Document DB (see :numref:`document_db`). In the relational model we only store a reference to the actual location ID, and a name (e.g., the address string or a restaurant name). The relationships between ``Trip`` and ``Location`` are abbreviated in the diagram, but each trip has a mandatory pickup and destination location, and a (possibly empty) list of additional stops. * To represent monetary values, we use a `Money value type `_. ``Money`` objects do not live as entities on their own, but should be embedded into the entities that use them (``TripReceipt`` and ``Match``). * Make sure to use the concrete type of the ``IMoney`` entity when using as property. * Entities that contain multiple embedded properties of the same type require column renaming to avoid column duplication. Look into ``AttributeOverrides`` to rename them. We have defined constant values to use the ``AttributeOverrides`` annotation in ``Constants.java``. * The ``partOf`` association in the Organization entity defines organizational hierarchies. For flexible modeling, we allow an Organization to have both multiple parents and children. For this relation you need to make sure that the created join table is named as defined in the constant ``J_ORGANIZATION_PARTS`` and uses the join columns as defined in ``I_ORGANIZATION_PART_OF`` and ``I_ORGANIZATION_PARTS``. * Passwords of Riders are stored as a cryptographic hash value of the password string (never store plain text passwords to the database!). `SHA-1 `_ is used to generate a digest and store the result as binary data. SHA-1 hashes are fixed in size, so you should configure the password column to use the optimal data type and length. You can assume that the byte array passed is already the digest. Please note that SHA-1 is broken and not recommended to be used in production systems. Instead use, for example, the `SHA-2 standard `_. * Make sure that the Rider's email is unique. .. note:: The XML mapping of ``Trip.xml`` requires the elements inside the ``attributes`` tag to be defined in a specific order. Checkout the `XML Schema `_ for information about the ordering. Inheritance Mapping -------------------------- Inheritance is an essential mechanism in object-oriented modeling. However, the relational model does not support the concept of inheritance, making inheritance one of the well-known `object-relational impedence mismatches `_. There are, however, several ways an inheritance hierarchy of classes can be represented in relational tables. You may have noticed that the ``Rider`` and ``Driver`` entities have many commonalities. For this task, you should abstract these commonalities into the abstract entity ``PlatformUser``. Choose one of the three well-known inheritance patterns to map the class hierarchy via JPA to the database. Make sure that a Rider's email and name make up a composite unique constraint. The inheritance hierarchy are shown in :numref:`fig-ass1-inheritance`. .. _fig-ass1-inheritance: .. figure:: img/ass1-inheritance.svg :width: 80 % :align: center Abstracting common properties via inheritance .. note:: During the discussion sessions you should be able to explain the different inheritance patterns and discuss your specific choice. Querying Entities ============================== In this task, you will learn how to query JPA entities using JPQL and the JPA Criteria API. If you haven't already, read the respective chapters on `querying entities in the Java EE Tutorial `_. Implement the DAO interfaces provided in the template, and return your implementations in the ``DAOFactory``. Execute the queries in the respective DAO methods. You can find the correct methods via their names (they should be descriptive) and signatures. Simple Named Queries --------------------------- Implement the following queries as JPQL *named queries* (no plain SQL, no inline queries, and no programmatic filtering with Java code!). Declare them in the appropriate entities and execute them in the corresponding DAO methods. a. Find a rider by their unique email address. b. Find trips by status. c. Calculate across all trip infos the average rating per rider and sort it by the rating descending. You can use Java code to map the query result to a ``TupleResult``. .. _n1select: Complex Named Queries ---------------------------- Implement the following query as a combination of one or more named queries and Java code. a. Find all riders that have not completed a Trip within a given time range. .. note:: You do not need to find a single query to solve this task (you can use a combination of Java code and named queries), but you have to keep ORM-performance in mind, i.e., make sure that your solution is also reasonably fast if you have many entities. During the discussion session you should be able to explain what types of problems can arise with badly written queries. JPA Criteria API Queries ------------------------------- Implement the following query using the JPA Criteria API (i.e., you should not use JPQL, but write the queries using the ``CriteriaBuilder`` `fluent interface `_). Again, put the code into your implementation of the respective DAO methods. You should implement the method as a single query. Note that all parameters should be optional (i.e., if the parameter is null, don't add the criteria when building the query). a. Write a query that calculates the average tip in percentage with regards to the total fare grouped by the PaymentMethod. Only consider completed trips within an optional time range. The method should return a descending sorted list by the calculated average tip percentage. jOOQ: Object Oriented Querying ============================== Until now we have used JPA as ORM to access our data. We have seen benefits and disadvantages using such technology and the complexity it can add to an application. In contrast to that, SQL mapper frameworks offer an API that most often resembles SQL as much as possible and does not contain as much magic as ORMs. The trend of thriving for a simple API can also be seen in the microservice application paradigm. The paradigm splits monolithic application into smaller units (= microservices) that allow flexible and fast development with minimal dependencies between services. This entails a clear seperation of database access and each microservice has to implement its own data access layer and use its own database. In this part of the assignment we take on the spirit of microservices and extend the existing entity schema by including tables to store arbitrary rider preferences. To this end, we use the library `jOOQ `_ that offers a SQL-centric abstraction for Java. In comparison to JPA, you will have to write your own SQL schema and jOOQ will generate classes accordingly. We highly recommend to go over the following tutorial from jOOQ to familiarize yourself with the API: `Tutorial `_. .. rubric:: Implementation This part is split into three parts and similar to what you have done already in JPA: 1. Defining the SQL schema 2. Implementing the models 3. Implementing the DAOs In the following we reference to packages that are located in the `ass1-jooq` module and omit the most of the package path. For example, if we write about the `model` package, we specifically talk about the `dst.ass1.jooq.model` package in the `ass1-jooq` module. SQL Schema ---------- .. _fig1-ass1-jooq-schema: .. figure:: img/ass1-model-extension.svg :width: 80 % :align: center Fill out the SQL schema located in ``resources/schema.sql`` based on the shown entity model. Afterwards, in the ``generate-sources`` phase jOOQ will scan your existing tables in your database and generate sources from it. You can find the generated files under ``target/generated-sources/joq/dst.ass1.jooq.model/public_/tables``, make sure to have a look and verify your solution. The ``pom.xml`` of the `ass1-jooq` module is already configured to generate these classes as an implementation of the interaces located in the ``entity`` package. Models ------ Afterwards, you can implement the ``IRiderPreference`` interface located in the ``model`` package. jOOQ should generate the implementations for the entities located in the ``entity`` package! You also must implement the `IModelFactory` to create an instance of the ``IRiderPreference`` interface. Datasource ---------- If you have familiarized yourself already with jOOQ (e.g., by going through the tutorial we have linked above), you know that jOOQ also needs a ``DSLContext`` to execute database queries and other SQL statements. We provide this implementation for you and you can create connections using the ``getConnection`` method in the ``DataSource`` class, located in the ``connection`` package. .. note:: Please familiarize yourself with the concepts of connection creation in jOOQ. You should know which features/options our implementation activates and what that means for the execution. Specifically, look at the module's ``pom.xml`` (e.g., which jOOQ arguments are present) and how the DSLContext from ``getConnection`` is created. jOOQ DAOs --------- In this part of the assignment you will implement the DAO to retrieve, create, query and update methods for the entities. Implement the interfaces `IRiderPreferenceDAO` and ``IDAOFactory`` using jOOQ. You have to implement the following methods in the ``IRiderPreferenceDAO``: * ``insert``: creates a new Rider Preference and saves all preferences that are included. * ``updatePreferences``: this method receives a new Rider Preference instance with which we updates the old one. That means, you have to update all fields and overwrite the values of existing keys present in the passed object. You can view it as a partial update, which does not delete keys and only adds new keys and updates existing ones. * ``findAll``: Retrieves all Rider Preferences from that database. Make sure that you solution is resource-efficient (e.g., avoid the N+1 problem). * ``delete``: deletes all data for a specified id. * ``findById``: Retrieves the Rider Preferences for a specific rider. .. note:: Make sure that all methods that all write operations are done in a transaction and therefore executed at once. You should be able to answer during the group interview how you achieved this and also what other guarantees you can make about the execution (e.g., think about the data source). .. _document_db: NoSQL: Document Databases ============================== Not all data handled by enterprise applications is well suited to be stored in traditional tabular relations. For example, consider a database of different geospatial locations (e.g., pickup and drop-off locations, popular places, etc.). First, creating relational schemata for all the different types of data we want to store with a location (e.g., whether it is a restaurant, and all the information associated with it) may not be possible or make their structure very rigid, which may be undesirable for different reasons. Also, answering questions such as "which other users have requested a pickup within a 5 kilometer radius of the given coordinates", is not possible in traditional SQL. For this task, you will use MongoDB to store and query geospatial locations. MongoDB is a document-oriented NoSQL (non/not-only SQL) database, which can store schemaless semi-structured `JSON `_ data called *documents*. Similar to the previous tasks, the unit tests for this task spawn an in-memory MongoDB instance so you don't have to worry about running a server or any type of configuration. Note that, the embedded MongoDB downloads a library the first time you run a test, which can take several minutes. To access the MongoDB instance, you will use the `MongoDB Java Driver `_. .. rubric:: Implementation The package ``dst.ass1.doc`` in the project template contains an interface for inserting data into MongoDB (``IDocumentRepository``) and one for running queries against the MongoDB instance (``IDocumentQuery``). Use these interfaces for the remaining parts of this section and put your implementation classes into a sub-package ``dst.ass1.doc.impl``. Also make sure to properly instantiate and return your implementation classes in ``DocumentServiceFactory``. .. rubric:: Document structure Locations stored as documents have four predefined properties and zero or more additional properties. The ``location_id`` refers back to the ``locationId`` of the Location entity stored in the relational database. The ``type`` describes whether the Location describes, e.g., a place or a generic address. The ``name`` refers back to the name property of the Location entity. The ``geo`` property holds a `GeoJSON Object `_ that specifies the geospatial data. Additional properties are dependent on the type. For example, a place may fall into a specific category. An example is given below. :: [{ "location_id": 9241753, "type": "place", "name": "TU Wien", "openHour": 10, "closingHour": 18, "geo": { "type": "Point", "coordinates": [48.199, 16.3699] }, "category": "University" }, { "location_id": 9241754, "type": "address", "name": "...", "geo": { /* ... */ } }] Documents and Collections ------------------------------- Data in MongoDB is organized into collections that belong to a database. All location data will be stored in one collection in our database instance. The name of the database and collection can be found in the ``Constants`` class. The unit tests insert test data loaded from ``DocumentTestData``. For this task, implement the ``IDocumentRepository#insert(ILocation,Map)`` method to insert a new document into the collection. The document's ``location_id`` and ``name`` properties should be read from the given entity instance. All additional properties are passed via a ``Map`` (maps naturally mix well with BSON Documents). Furthermore, as we will often retrieve documents by ``location_id`` and ``geo``, you should add appropriate indices to the collection to speed up these queries. Query by Example ---------------------- The lack of a structured schema makes it difficult to provide a well structured query language such as SQL. Instead, many document databases use the query-by-example pattern to query data, i.e., using documents to express queries. Also, MongoDB natively supports `geospatial queries `_, which you will use. In this task, you should use MongoDB's `Query Documents `_ to implement two queries in ``IDocumentQuery``: a. Given a ``type`` find all documents that have this specific type. b. Fetch all entries where the ``name`` field completely or partially matches the given input string, and are within a given Polygon (i.e., as a list of coordinates). Internally, this query should use a projection to only return the ``location_id`` property, no others. Provide reasonable logging output with information about the results of your queries. Aggregation Pipelines --------------------- For this task, to gain insights about the stored documents, you will implement a more complex query using the powerful `Aggregation Pipeline `_ . An aggregation pipeline consists of one or more stages that process documents: * Each stage performs an operation on the input documents. For example, a stage can filter documents, group documents, and calculate values. * The documents that are output from a stage are passed to the next stage. * An aggregation pipeline can return results for groups of documents. For example, return the total, average, maximum, and minimum values. Familiarize yourself with aggregation concepts and refer to the `mongodb-java-driver aggregation docs `_ on how to execute an aggregation pipeline. Implement the query in the method ``IDocumentQuery#getAverageOpeningHoursPerCategory()``. The output of your aggregation query should be a list of all existing Place Categories (encoded in the ``category`` property of documents where the value of ``type`` equals "place") and their average opening hours for each Category. Example output: Example output:: { "_id" : "University":, "value" : 12.2} { "_id" : "Restaurant": "value" : 8.5} As in the previous task, provide logging output with information about the results of your queries. NoSQL: Key-Value Stores ============================ Sessions are a fundamental concept of web applications and typically handled by the web server. However, in large-scale scenarios, it is beneficial to implement a dedicated session management component that can be scaled on demand. In this task, you will use the `Redis`_ key--value store to create a scalable session management service. Sessions belong to a user ID and are identified by unique session tokens (e.g., UUIDs). Sessions can hold arbitrary data as key--value pairs. When a session's timeout has elapsed, the session expires and all variables associated with the session should be removed. Implement the ``ISessionManager`` interface and instantiate your implementation in the ``SessionManagerFactory``. To communicate with Redis, use the `Jedis `_ client library which is already provided in the template. All data must be distributed through Redis, meaning your implementation cannot rely on Java in-memory data structures. Also, make sure that all Redis operations that manipulate multiple keys (for example when creating a session) are atomic. To that end, use Redis' ``MULTI``, and/or ``WATCH``--``EXEC`` transaction mechanisms. Furthermore, your implementation should not rely on searching the Redis keyspace with the ``KEYS`` command, or on executing Lua scripts. Unlike in the previous tasks, Redis does not come with Java in-memory facilities, so we need a running Redis server instance to execute the JUnit tests. The Docker Compose file starts a Redis instance, so to execute the tests you simply need to start the ``docker-compose.yml`` file. The ``redis.properties`` file contains the host and port configuration, which default to the local Redis instance started via ``docker-compose.yml``. .. hint:: To interact with the Redis instance, ou can either locally install the `redis-cli `_ or connect with the Redis container (``docker exec -it redis-cli``). Data Structures ------------------ First, implement the ``createSession`` method to create a new session for the given user. To create a random session token you can, for example, use Java's ``java.util.UUID`` utility. Use appropriate Redis data structures to store session variables. If you want you can hold the user ID of a session in the session variable ``userId``. Don't forget that all session related keys should be removed once the session expires (hint: use the ``EXPIRE`` command). .. note:: In the discussion sessions, you should explain the data structures you used, and why you chose them over others that Redis provides. Check-and-Set Operations ------------------------- The ``requireSession`` method is a typical check-and-set operation. A new session should only be created if the given user does not already have an active session (e.g., through previous calls of ``requireSession``). Atomicity is particularly important for this method. Consider that the method may be invoked concurrently by multiple distributed clients (i.e., you cannot rely on Java locking mechanisms), so make sure your implementation guarantees that a concurrently created session isn't overwritten (note: you can ignore the case in which a key may expire during a transaction). For ``requireSession`` you might also need auxiliary keys that store the session ID of the user relation (may be overwritten by ``createSession``, but not by ``requireSession``). Theory Questions =================== The following questions will be discussed during the discussion session. Each question is worth 2 points. Before the actual lesson, you can specify the questions you are prepared to present in the respective ticking activity (Assignment 1 theory questions) in TUWEL. We will then ask individual students who ticked a question to present their results. .. note:: Only tick questions you are also prepared to present. If you ticked a question but fail to provide a sound answer, you may lose all points for the theory part of this assignment. Also, it is not possible to tick or untick questions at the session! Annotation vs. XML Declarations ------------------------------- In the previous tasks you already gained some experiences using annotations and XML. What are the benefits and drawbacks of each approach? In what situations would you use which one? *Hint: Think about maintainability and the different roles usually involved in software projects.* Entity Manager and Entity Lifecycle --------------------------------------------- What is the *lifecycle* of a JPA entity, i.e., what are the different states an entity can be in? What EntityManager operations change the state of an entity? How and when are changes to entities propagated to the database? Optimistic vs. Pessimistic Locking ----------------------------------- The database systems you have used in this assignment provide different types of concurrency control mechanisms. Redis, for example, provides the concept of *optimistic locks*. The JPA EntityManager allows one to set a *pessimistic read/write lock* on individual objects. What are the main differences between these locking mechanisms? In what situations or use cases would you employ them? Think of problems that can arise when using the wrong locking mechanism for these use cases. Database Scalability ------------------------------ How can we address system growth, i.e., increased data volume and query operations, in databases? *Hint: vertical vs. horizontal scaling*. What methods in particular do MongoDB and Redis provide to support scalability?