AIC — Advanced Internet Computing WS 2024/25[Bearbeiten | Quelltext bearbeiten]

Slide Set 1 — Week 1[Bearbeiten | Quelltext bearbeiten]

Software Evolution[Bearbeiten | Quelltext bearbeiten]

Requirements cannot be fully gathered upfront or frozen
Too many stake-holders

Open World Assumption[Bearbeiten | Quelltext bearbeiten]

Ambient intelligence
Loosely coupled
Accessed on demand

Ecosystems[Bearbeiten | Quelltext bearbeiten]

Complex system with networked dependencies and intrinsic adaptive behavior

Robustness & Resilience mechanisms
Measures of health
Built-in coherence
Entropy-resistence

Layers of Paradigms[Bearbeiten | Quelltext bearbeiten]

Paradigm 1 Elasticity (Resilience)[Bearbeiten | Quelltext bearbeiten]

Elasticity > Scalability

Paradigm 2 Osmotic Computing[Bearbeiten | Quelltext bearbeiten]

Dynamic management of microservices across cloud and edge datacenters

Paradigm 3 Social Compute Units (SCUs)[Bearbeiten | Quelltext bearbeiten]

Service-oriented Computing (SoC)[Bearbeiten | Quelltext bearbeiten]

What is a service?[Bearbeiten | Quelltext bearbeiten]

standardized interface
self-contained with no dependencies to other services
available
context independent

Service Properties & State[Bearbeiten | Quelltext bearbeiten]

Functional: operational characteristics, behaviour
Non-functional: description targets service quality attributes, metering and cost, performance metrics
Stateless: Services can be invoked repeatedly without having to maintain context
Stateful: context preserved from one invocation to the next

Loose Coupling & Granularity[Bearbeiten | Quelltext bearbeiten]

Loose Coupling
Service granularity
- multiple services involved for a single process
- Coarse-grained complex services imply larger and richer data structures

Synchronicity & Well-definedness[Bearbeiten | Quelltext bearbeiten]

Synchonicity: synchronous (rpc with arguments) vs. asynchronous (entire document)
well definedness: service interaction must be well-defined. The web service description language WSDL allows applications to escribe to other applications the rule for interfacing and interacting

Slide Set 2 — Week 2 — Cloud Computing[Bearbeiten | Quelltext bearbeiten]

Motivation[Bearbeiten | Quelltext bearbeiten]

Pay-per-use, lower maintenance cost, scaling, fault tolerant
Use cases
- Demand varies with peak loads
- Deamn is unknownin advance
- Batch workloads

Cloud Computing Basics[Bearbeiten | Quelltext bearbeiten]

NIST Definition
- On demand self service
- broad network access
- resource pooling, virtualization
- rapid elasticity, virtually unlimited capacity
- measured service

Three Cloud Service Models[Bearbeiten | Quelltext bearbeiten]

IaaS Infrastructure as a Service
- Virtual Machines
- Amazon EC2, Amazon EBS
PaaS Platform as a Sevice
- Computing Platform and solution stack / framework
- Google App Engine, Heroku
SaaS Software as a Service
- CRM software
- Google Docs

IaaS PaaS SaaS[Bearbeiten | Quelltext bearbeiten]

Speed vs. Customization (SaaS is not as flexible)
Cost (PaaS can be cheaper)
Vendor lock-in (SaaS and PaaS worse than IaaS)

Cloud Deployment Models[Bearbeiten | Quelltext bearbeiten]

Public Cloud (AWS, Azure, low cost, no upfront cost, no maintenance)
Private Cloud (Operated soley for one single org, self-reliance, flexibility, security, compliance)
Community Cloud
Hybrid Cloud (control of private infrastructure, flexibility take advantage of additional resources)

Virtualization[Bearbeiten | Quelltext bearbeiten]

Abstract view on resources
- Platform (complete machine)
- Memory
- Storage
- Network
Resource Pooling
- resources are shared between users (multitenancy)
- backend parallelization
Consolidation
- Put many different classes of applications onto different VMs in the same data center
Fault Tolerance
- Save VM state

Types of Virtualization[Bearbeiten | Quelltext bearbeiten]

Hardware-level Virtualization
- Emulating full virtual computer hardware platforms (VMs)
- Hypervisors (Virtual Machine Monitors)
- Bare Metal (type 1)
  - Lightweight virtualization layer directly on host hardware
  - good performance and stability
  - Xen, VMware ESXi
- Hosted (type 2)
  - Runs on a host OS
  - Forwards calls to the host OS (overhead)
  - QEMU, VirtualBox
- Distinction not always clear, e.g. KVM
- Full virtualization (slow)
- Hardware-assisted virtualization (host knows virt is taking place, CPU requires virtualization extensions)
- Paravirtualization (software-assisted)
- Lightweight technique, near-native performance
Operating System (OS)-level Virtualization (Containerization)
- OS kernel manages coexistence of multiple isolated user spaces
- Containers
  - Share host OS and drivers
  - near native performance
  - not as secure as Vms
  - more elastic than hypervisors
- Linux Containers LXC, FreeBSD jails, OpenVZ
- Docker is leading containerization technology
  - initially implemented on top of LXC
- Cross-platform portability
  - bundles FS, runtime, sys libraries

Existing commercial cloud offerings[Bearbeiten | Quelltext bearbeiten]

AWS[Bearbeiten | Quelltext bearbeiten]

IaaS, PaaS, SaaS
Elastic Compute Cloud EC2
Simple Storage Service S3
Simple Queue Service SQS

EC2[Bearbeiten | Quelltext bearbeiten]

virtual machines with different capabilities (general purpose m4)
4 types of billing
- On-Demand (pay-per-use, flexible, no long term commitments)
- Reserved Instances (cheaper in exchange for long-term commitment)
- Spot Instances (Bid for spare EC2 capacity for big discounts)
- Dedicated Hosts (Dedicated physical server, useful for compliance targets)

Storage in AWS[Bearbeiten | Quelltext bearbeiten]

Simple Storage Service S3 (object-based, persistent)
Elastic Block Store EBS (like raw unformatted harddrive)
Elastic File System

Simple Queue Service SQS[Bearbeiten | Quelltext bearbeiten]

scaling, decouples application components so that
- you can scale transparently
- components can fail safely
managed by Amazon (reliable, redundant)
Two types
- standard queue, high throuhput, at least once delivery, best effort ordering
- FIFO, exactly-once delivery, limited thorughput

Heroku Platform[Bearbeiten | Quelltext bearbeiten]

PaaS
(I miss their free tier)

Cloud QoS[Bearbeiten | Quelltext bearbeiten]

Measure of technical quality of a web or cloud service
- Performance
- Availability
- Failure rate
- Security
- Trust
- Compliance

Instance-Level Performance QoS Metrics[Bearbeiten | Quelltext bearbeiten]

Round-Trip Time and Response Time
Network Latency
Processing Time
Wrapping Time
Execution time

Aggregated QoS Metrics[Bearbeiten | Quelltext bearbeiten]

Throughput (maximum processing rate)
Availability A = uptime / (uptime+downtime)
Combined Availability is just multiplying 0.9 * 0.8 * 0.95
- Replicated Availability is 1-(1-a)^n, e.g. server with 0.8 availability with 3 replicas is 1- (1-0.8)^3. (probability that all 3 servers will fail is 0.2^3)

Service Level Agreements (SLAs)[Bearbeiten | Quelltext bearbeiten]

For B2B interactions, normal users mostly get best effort delivery
concrete Service Level Objectives (SLOs)
metrics, concrete target values
penalties for non-achievement, validity period
responsible monitoring entity

Slide Set 3 — Week 3 — Edge Computing and Intelligence at the Edge (1)[Bearbeiten | Quelltext bearbeiten]

Compute as physically close to the source as possible.
AI on the Edge, Federated Learning

AI Accelerators[Bearbeiten | Quelltext bearbeiten]

DNN Processors or sometimes called TPUs
Optimized for AI workload, Matrix Multiplications, Multiply and accumulate

Graph Compiler Basics[Bearbeiten | Quelltext bearbeiten]

Map high-level computational abstractions of DL Frameworks, i.e. layers to operations executable on an accelerator.
Parallelize forward pass where possible

Service Level Objectives (SLOs)[Bearbeiten | Quelltext bearbeiten]

SLAs comprised of one or more SLOs
Quantifiable measure for platform providers that ensure QoS
When SLos are violated, scaling horiziontally or vertically is needed ⇒ Leverage elasticity in the edge-cloud continuum.

SLOs at the edge[Bearbeiten | Quelltext bearbeiten]

same as cloud SLOs but with additional challenges
Additional Constraints and Considerations
- Computational Power
- AI Accelerator (yes / no / maybe)
- Battery Level
- Network Quality
- ⇒ “Cloud but with more pain”

Polaris Project High Level SLOs[Bearbeiten | Quelltext bearbeiten]

composed metrics, aggregated of multiple lower-level metrics

Inference - Cloud Offloading[Bearbeiten | Quelltext bearbeiten]

Cost efficient
“Infinite Resources”
Server-grade Hardware
But Privacy Concerns and Latency issues

Inference - Edge Offloading[Bearbeiten | Quelltext bearbeiten]

Privacy
Proximity
Server-grade Hardware
But Horizontal Scaling, Limited Resources and Cost factor

Serverless Computing[Bearbeiten | Quelltext bearbeiten]

Serverless ⇒ Function as a Service + Backend as a Service
- Implement applications exclusively with managed services
Cloud-Native
Pay-per-request
Completely different paradigm to micro services (!)
Stateless functions, provider auto scales replicas and routing requests

Backend as a Service[Bearbeiten | Quelltext bearbeiten]

Managed Service, hosted and scaled by third party provider
client programmers communicate through an API
typically no knowledge on host hardware
- Instead clients are offered SLAs
e.g. Message brokers, databases, user managment

Why serverless (edge)?[Bearbeiten | Quelltext bearbeiten]

In theory permits a fully automated system for provisioning orchestration and deployment
Provide same convenience of cloud-native development

Key Challenges[Bearbeiten | Quelltext bearbeiten]

Volatility, Unstable Network, even less reliable hardware
Prior Knowledge (beyond dark ages)
Discovery, Hardware and Services
Location

⇒ Scheduling is one of the primary concerns for Edge Computing

Slide Set 4 — Week 4 — Intelligence at the Edge (2)[Bearbeiten | Quelltext bearbeiten]

Deep Learning Quick Primer[Bearbeiten | Quelltext bearbeiten]

Convolutional, composed of filters or kernels, extracts local features from spatial or temporal data
Nonlinearity and Activation Functions
- add non-linearity throuch activation funcitons like ReLU

Model Compression[Bearbeiten | Quelltext bearbeiten]

Network Quantization
- Reduce precision from 32 bit floats, for faster inference and lower memory footprint
Network Pruning
- Remove components of a NN, e.g. channels in a Convolutional layer or neuron in a full connected layer
- Unstructured Pruning: Zero out weights, matrix remains the same size
- Structured Pruning: “physically” remove units from a network, changes architecture, less hardware reliant
Knowledge Distillation
- Train smaller student network under supervision of a larger teacher network
- Deep Neural Networks are typically over-parameterized
- Pruning is simple but cruder
- Hard Labels (one-hot) or use output of teacher

Split Inference[Bearbeiten | Quelltext bearbeiten]

AI accelerators are increasingly powerful, but cannot match performance of contemporary server-grade hardware
Currently either completely onload or offload a task
- Offload: when performance is critical, but leaves valuavle client side resource idle
- Onload: when latency sensitive, hope comporessed modles meet your performance demands
Split Inference
- Head and Tail Partitioning: Model is split
- Split Runtime: Distribute load between client and server
- Artificial Bottleneck Injection: Inject Autoencoder

Propaganda[Bearbeiten | Quelltext bearbeiten]

Fallacies of Distributed (offloading) Systems
- The network is reliable
- latency is zero
- bandwidth is infinite
- network is secure
- transport cost is zero
- topology doesnt change
- there is one administrator

Neural Feature Compression[Bearbeiten | Quelltext bearbeiten]

Lot’s of graphs and numbers in the slides…
Focuses on Transfer Cost Reduced per Second TCR/s

Slide Set 5 — Week 5 — IoT Cloud Continuum[Bearbeiten | Quelltext bearbeiten]

Internet of Things[Bearbeiten | Quelltext bearbeiten]

Sensing
Communicaiton
Processing
Behavior
Actuation

The traditional way[Bearbeiten | Quelltext bearbeiten]

cloud-centric, because “infinite” compute
however limits are reached. data transport strains network infra
cloud too far for latency sensitive iot, privacy is threatened

New developments[Bearbeiten | Quelltext bearbeiten]

device-to-cloud compute continuum emerges
IoT devices are more powerful and better connected
Edge Computing, moving computation near data source for enhanced privacy and reduced latency

The Computing Continuum[Bearbeiten | Quelltext bearbeiten]

Edge & Fog Computing[Bearbeiten | Quelltext bearbeiten]

Computation inlcudes data processing, compression, decision making, etc.
Emerging applications range from autonomous vehicles, augmented, reality to smart systems
Low latency, decentralization, less signalling and comms overhead

Where is the edge?[Bearbeiten | Quelltext bearbeiten]

Telcos: Edge of operator-controlled network (4G/5G base stations)
Others: First hop of IoT device, or end-device itself

Fog vs. Edge[Bearbeiten | Quelltext bearbeiten]

Fog computing has a wider scope
Deeply hierarchical multi layer architecture
fog computation anywhere among collaborating entities
Edge computing on the other hand typically spans mostly up to the edge of the operator’s network

Infrastructure technologies[Bearbeiten | Quelltext bearbeiten]

Connectivity[Bearbeiten | Quelltext bearbeiten]

Wireless Local/Personal Area networks
- WiFi Bluetooth, ZigBee
- High throughput
Wide Area Networks
- 4G LTE, 5G
- Low Power Wide Area Networks: LoRa, SigFox, LTE-M, NB-IoT

Low Power Wide Area Networking[Bearbeiten | Quelltext bearbeiten]

Event-drive or periodic data transmission
Very large number of devices → networking hardware should be cheap
Bluetooth Wi-Fi ZigBee not enough
Very low power operation
3 main candidates
- LoRa (zero fee possible, unlicensed spectrum)
- SigFox
- LTE-M, NB-IoT

Multi-access Edge Computing[Bearbeiten | Quelltext bearbeiten]

5G Ultra Reliable and Low Latency Communication URLLC

IoT Software Stacks[Bearbeiten | Quelltext bearbeiten]

Different Stacks for Device, Gateway and Cloud IoT service stacks
Key Design Principles
- Loose Coupling
- Modularity
- Platform Independence
- Open Standards
- Well-defined APIs

Microservices-based design[Bearbeiten | Quelltext bearbeiten]

Had that in previous sections already

Federated Learning[Bearbeiten | Quelltext bearbeiten]

Training on data directly on remote devices, without revealing the data themselves
Collect the outcomes, server aggregates these updates into a global model
Challenges
- Volatility
- Asnychronity
- non independent and identically distributed data
- preventing privacy leaks
- Incentives to misbehave
Integer Linear Promgramming (ILP) which devices to select

TU Wien:Advanced Internet Computing VU (Dustdar)/AIC WS2024/25 Summary