TU Wien:Advanced Internet Computing VU (Dustdar)/AIC WS2024/25 Summary

Aus VoWi
Zur Navigation springen Zur Suche springen

AIC — Advanced Internet Computing WS 2024/25[Bearbeiten | Quelltext bearbeiten]

Slide Set 1 — Week 1[Bearbeiten | Quelltext bearbeiten]

Software Evolution[Bearbeiten | Quelltext bearbeiten]

  • Requirements cannot be fully gathered upfront or frozen
  • Too many stake-holders

Open World Assumption[Bearbeiten | Quelltext bearbeiten]

  • Ambient intelligence
  • Loosely coupled
  • Accessed on demand

Ecosystems[Bearbeiten | Quelltext bearbeiten]

Complex system with networked dependencies and intrinsic adaptive behavior

  1. Robustness & Resilience mechanisms
  2. Measures of health
  3. Built-in coherence
  4. Entropy-resistence

Layers of Paradigms[Bearbeiten | Quelltext bearbeiten]

Paradigm 1 Elasticity (Resilience)[Bearbeiten | Quelltext bearbeiten]

Elasticity > Scalability

Paradigm 2 Osmotic Computing[Bearbeiten | Quelltext bearbeiten]

Dynamic management of microservices across cloud and edge datacenters

Paradigm 3 Social Compute Units (SCUs)[Bearbeiten | Quelltext bearbeiten]

Service-oriented Computing (SoC)[Bearbeiten | Quelltext bearbeiten]

What is a service?[Bearbeiten | Quelltext bearbeiten]

  • standardized interface
  • self-contained with no dependencies to other services
  • available
  • context independent

Service Properties & State[Bearbeiten | Quelltext bearbeiten]

  • Functional: operational characteristics, behaviour
  • Non-functional: description targets service quality attributes, metering and cost, performance metrics
  • Stateless: Services can be invoked repeatedly without having to maintain context
  • Stateful: context preserved from one invocation to the next

Loose Coupling & Granularity[Bearbeiten | Quelltext bearbeiten]

  • Loose Coupling
  • Service granularity
    • multiple services involved for a single process
    • Coarse-grained complex services imply larger and richer data structures

Synchronicity & Well-definedness[Bearbeiten | Quelltext bearbeiten]

  • Synchonicity: synchronous (rpc with arguments) vs. asynchronous (entire document)
  • well definedness: service interaction must be well-defined. The web service description language WSDL allows applications to escribe to other applications the rule for interfacing and interacting

Slide Set 2 — Week 2 — Cloud Computing[Bearbeiten | Quelltext bearbeiten]

Motivation[Bearbeiten | Quelltext bearbeiten]

  • Pay-per-use, lower maintenance cost, scaling, fault tolerant
  • Use cases
    • Demand varies with peak loads
    • Deamn is unknownin advance
    • Batch workloads

Cloud Computing Basics[Bearbeiten | Quelltext bearbeiten]

  • NIST Definition
    • On demand self service
    • broad network access
    • resource pooling, virtualization
    • rapid elasticity, virtually unlimited capacity
    • measured service

Three Cloud Service Models[Bearbeiten | Quelltext bearbeiten]

  • IaaS Infrastructure as a Service
    • Virtual Machines
    • Amazon EC2, Amazon EBS
  • PaaS Platform as a Sevice
    • Computing Platform and solution stack / framework
    • Google App Engine, Heroku
  • SaaS Software as a Service
    • CRM software
    • Google Docs

IaaS PaaS SaaS[Bearbeiten | Quelltext bearbeiten]

  • Speed vs. Customization (SaaS is not as flexible)
  • Cost (PaaS can be cheaper)
  • Vendor lock-in (SaaS and PaaS worse than IaaS)

Cloud Deployment Models[Bearbeiten | Quelltext bearbeiten]

  • Public Cloud (AWS, Azure, low cost, no upfront cost, no maintenance)
  • Private Cloud (Operated soley for one single org, self-reliance, flexibility, security, compliance)
  • Community Cloud
  • Hybrid Cloud (control of private infrastructure, flexibility take advantage of additional resources)

Virtualization[Bearbeiten | Quelltext bearbeiten]

  • Abstract view on resources
    • Platform (complete machine)
    • Memory
    • Storage
    • Network
  • Resource Pooling
    • resources are shared between users (multitenancy)
    • backend parallelization
  • Consolidation
    • Put many different classes of applications onto different VMs in the same data center
  • Fault Tolerance
    • Save VM state

Types of Virtualization[Bearbeiten | Quelltext bearbeiten]

  • Hardware-level Virtualization
    • Emulating full virtual computer hardware platforms (VMs)
    • Hypervisors (Virtual Machine Monitors)
    • Bare Metal (type 1)
      • Lightweight virtualization layer directly on host hardware
      • good performance and stability
      • Xen, VMware ESXi
    • Hosted (type 2)
      • Runs on a host OS
      • Forwards calls to the host OS (overhead)
      • QEMU, VirtualBox
    • Distinction not always clear, e.g. KVM
    • Full virtualization (slow)
    • Hardware-assisted virtualization (host knows virt is taking place, CPU requires virtualization extensions)
    • Paravirtualization (software-assisted)
    • Lightweight technique, near-native performance
  • Operating System (OS)-level Virtualization (Containerization)
    • OS kernel manages coexistence of multiple isolated user spaces
    • Containers
      • Share host OS and drivers
      • near native performance
      • not as secure as Vms
      • more elastic than hypervisors
    • Linux Containers LXC, FreeBSD jails, OpenVZ
    • Docker is leading containerization technology
      • initially implemented on top of LXC
    • Cross-platform portability
      • bundles FS, runtime, sys libraries

Existing commercial cloud offerings[Bearbeiten | Quelltext bearbeiten]

AWS[Bearbeiten | Quelltext bearbeiten]

  • IaaS, PaaS, SaaS
  • Elastic Compute Cloud EC2
  • Simple Storage Service S3
  • Simple Queue Service SQS

EC2[Bearbeiten | Quelltext bearbeiten]

  • virtual machines with different capabilities (general purpose m4)
  • 4 types of billing
    • On-Demand (pay-per-use, flexible, no long term commitments)
    • Reserved Instances (cheaper in exchange for long-term commitment)
    • Spot Instances (Bid for spare EC2 capacity for big discounts)
    • Dedicated Hosts (Dedicated physical server, useful for compliance targets)

Storage in AWS[Bearbeiten | Quelltext bearbeiten]

  • Simple Storage Service S3 (object-based, persistent)
  • Elastic Block Store EBS (like raw unformatted harddrive)
  • Elastic File System

Simple Queue Service SQS[Bearbeiten | Quelltext bearbeiten]

  • scaling, decouples application components so that
    • you can scale transparently
    • components can fail safely
  • managed by Amazon (reliable, redundant)
  • Two types
    • standard queue, high throuhput, at least once delivery, best effort ordering
    • FIFO, exactly-once delivery, limited thorughput

Heroku Platform[Bearbeiten | Quelltext bearbeiten]

  • PaaS
  • (I miss their free tier)

Cloud QoS[Bearbeiten | Quelltext bearbeiten]

  • Measure of technical quality of a web or cloud service
    • Performance
    • Availability
    • Failure rate
    • Security
    • Trust
    • Compliance

Instance-Level Performance QoS Metrics[Bearbeiten | Quelltext bearbeiten]

  • Round-Trip Time and Response Time
  • Network Latency
  • Processing Time
  • Wrapping Time
  • Execution time

Aggregated QoS Metrics[Bearbeiten | Quelltext bearbeiten]

  • Throughput (maximum processing rate)
  • Availability A = uptime / (uptime+downtime)
  • Combined Availability is just multiplying 0.9 * 0.8 * 0.95
    • Replicated Availability is 1-(1-a)^n, e.g. server with 0.8 availability with 3 replicas is 1- (1-0.8)^3. (probability that all 3 servers will fail is 0.2^3)

Service Level Agreements (SLAs)[Bearbeiten | Quelltext bearbeiten]

  • For B2B interactions, normal users mostly get best effort delivery
  • concrete Service Level Objectives (SLOs)
  • metrics, concrete target values
  • penalties for non-achievement, validity period
  • responsible monitoring entity

Slide Set 3 — Week 3 — Edge Computing and Intelligence at the Edge (1)[Bearbeiten | Quelltext bearbeiten]

  • Compute as physically close to the source as possible.
  • AI on the Edge, Federated Learning

AI Accelerators[Bearbeiten | Quelltext bearbeiten]

  • DNN Processors or sometimes called TPUs
  • Optimized for AI workload, Matrix Multiplications, Multiply and accumulate

Graph Compiler Basics[Bearbeiten | Quelltext bearbeiten]

  • Map high-level computational abstractions of DL Frameworks, i.e. layers to operations executable on an accelerator.
  • Parallelize forward pass where possible

Service Level Objectives (SLOs)[Bearbeiten | Quelltext bearbeiten]

  • SLAs comprised of one or more SLOs
  • Quantifiable measure for platform providers that ensure QoS
  • When SLos are violated, scaling horiziontally or vertically is needed ⇒ Leverage elasticity in the edge-cloud continuum.

SLOs at the edge[Bearbeiten | Quelltext bearbeiten]

  • same as cloud SLOs but with additional challenges
  • Additional Constraints and Considerations
    • Computational Power
    • AI Accelerator (yes / no / maybe)
    • Battery Level
    • Network Quality
    • ⇒ “Cloud but with more pain”

Polaris Project High Level SLOs[Bearbeiten | Quelltext bearbeiten]

  • composed metrics, aggregated of multiple lower-level metrics

Inference - Cloud Offloading[Bearbeiten | Quelltext bearbeiten]

  • Cost efficient
  • “Infinite Resources”
  • Server-grade Hardware
  • But Privacy Concerns and Latency issues

Inference - Edge Offloading[Bearbeiten | Quelltext bearbeiten]

  • Privacy
  • Proximity
  • Server-grade Hardware
  • But Horizontal Scaling, Limited Resources and Cost factor

Serverless Computing[Bearbeiten | Quelltext bearbeiten]

  • Serverless ⇒ Function as a Service + Backend as a Service
    • Implement applications exclusively with managed services
  • Cloud-Native
  • Pay-per-request
  • Completely different paradigm to micro services (!)
  • Stateless functions, provider auto scales replicas and routing requests

Backend as a Service[Bearbeiten | Quelltext bearbeiten]

  • Managed Service, hosted and scaled by third party provider
  • client programmers communicate through an API
  • typically no knowledge on host hardware
    • Instead clients are offered SLAs
  • e.g. Message brokers, databases, user managment

Why serverless (edge)?[Bearbeiten | Quelltext bearbeiten]

  • In theory permits a fully automated system for provisioning orchestration and deployment
  • Provide same convenience of cloud-native development

Key Challenges[Bearbeiten | Quelltext bearbeiten]

  • Volatility, Unstable Network, even less reliable hardware
  • Prior Knowledge (beyond dark ages)
  • Discovery, Hardware and Services
  • Location

⇒ Scheduling is one of the primary concerns for Edge Computing

Slide Set 4 — Week 4 — Intelligence at the Edge (2)[Bearbeiten | Quelltext bearbeiten]

Deep Learning Quick Primer[Bearbeiten | Quelltext bearbeiten]

  • Convolutional, composed of filters or kernels, extracts local features from spatial or temporal data
  • Nonlinearity and Activation Functions
    • add non-linearity throuch activation funcitons like ReLU

Model Compression[Bearbeiten | Quelltext bearbeiten]

  • Network Quantization
    • Reduce precision from 32 bit floats, for faster inference and lower memory footprint
  • Network Pruning
    • Remove components of a NN, e.g. channels in a Convolutional layer or neuron in a full connected layer
    • Unstructured Pruning: Zero out weights, matrix remains the same size
    • Structured Pruning: “physically” remove units from a network, changes architecture, less hardware reliant
  • Knowledge Distillation
    • Train smaller student network under supervision of a larger teacher network
    • Deep Neural Networks are typically over-parameterized
    • Pruning is simple but cruder
    • Hard Labels (one-hot) or use output of teacher

Split Inference[Bearbeiten | Quelltext bearbeiten]

  • AI accelerators are increasingly powerful, but cannot match performance of contemporary server-grade hardware
  • Currently either completely onload or offload a task
    • Offload: when performance is critical, but leaves valuavle client side resource idle
    • Onload: when latency sensitive, hope comporessed modles meet your performance demands
  • Split Inference
    • Head and Tail Partitioning: Model is split
    • Split Runtime: Distribute load between client and server
    • Artificial Bottleneck Injection: Inject Autoencoder

Propaganda[Bearbeiten | Quelltext bearbeiten]

  • Fallacies of Distributed (offloading) Systems
    • The network is reliable
    • latency is zero
    • bandwidth is infinite
    • network is secure
    • transport cost is zero
    • topology doesnt change
    • there is one administrator

Neural Feature Compression[Bearbeiten | Quelltext bearbeiten]

  • Lot’s of graphs and numbers in the slides…
  • Focuses on Transfer Cost Reduced per Second TCR/s

Slide Set 5 — Week 5 — IoT Cloud Continuum[Bearbeiten | Quelltext bearbeiten]

Internet of Things[Bearbeiten | Quelltext bearbeiten]

  • Sensing
  • Communicaiton
  • Processing
  • Behavior
  • Actuation

The traditional way[Bearbeiten | Quelltext bearbeiten]

  • cloud-centric, because “infinite” compute
  • however limits are reached. data transport strains network infra
  • cloud too far for latency sensitive iot, privacy is threatened

New developments[Bearbeiten | Quelltext bearbeiten]

  • device-to-cloud compute continuum emerges
  • IoT devices are more powerful and better connected
  • Edge Computing, moving computation near data source for enhanced privacy and reduced latency

The Computing Continuum[Bearbeiten | Quelltext bearbeiten]

Edge & Fog Computing[Bearbeiten | Quelltext bearbeiten]

  • Computation inlcudes data processing, compression, decision making, etc.
  • Emerging applications range from autonomous vehicles, augmented, reality to smart systems
  • Low latency, decentralization, less signalling and comms overhead

Where is the edge?[Bearbeiten | Quelltext bearbeiten]

  • Telcos: Edge of operator-controlled network (4G/5G base stations)
  • Others: First hop of IoT device, or end-device itself

Fog vs. Edge[Bearbeiten | Quelltext bearbeiten]

  • Fog computing has a wider scope
  • Deeply hierarchical multi layer architecture
  • fog computation anywhere among collaborating entities
  • Edge computing on the other hand typically spans mostly up to the edge of the operator’s network

Infrastructure technologies[Bearbeiten | Quelltext bearbeiten]

Connectivity[Bearbeiten | Quelltext bearbeiten]

  • Wireless Local/Personal Area networks
    • WiFi Bluetooth, ZigBee
    • High throughput
  • Wide Area Networks
    • 4G LTE, 5G
    • Low Power Wide Area Networks: LoRa, SigFox, LTE-M, NB-IoT

Low Power Wide Area Networking[Bearbeiten | Quelltext bearbeiten]

  • Event-drive or periodic data transmission
  • Very large number of devices → networking hardware should be cheap
  • Bluetooth Wi-Fi ZigBee not enough
  • Very low power operation
  • 3 main candidates
    • LoRa (zero fee possible, unlicensed spectrum)
    • SigFox
    • LTE-M, NB-IoT

Multi-access Edge Computing[Bearbeiten | Quelltext bearbeiten]

  • 5G Ultra Reliable and Low Latency Communication URLLC

IoT Software Stacks[Bearbeiten | Quelltext bearbeiten]

  • Different Stacks for Device, Gateway and Cloud IoT service stacks
  • Key Design Principles
    • Loose Coupling
    • Modularity
    • Platform Independence
    • Open Standards
    • Well-defined APIs

Microservices-based design[Bearbeiten | Quelltext bearbeiten]

  • Had that in previous sections already

Federated Learning[Bearbeiten | Quelltext bearbeiten]

  • Training on data directly on remote devices, without revealing the data themselves
  • Collect the outcomes, server aggregates these updates into a global model
  • Challenges
    • Volatility
    • Asnychronity
    • non independent and identically distributed data
    • preventing privacy leaks
    • Incentives to misbehave
  • Integer Linear Promgramming (ILP) which devices to select