TU Wien:Advanced Internet Computing VU (Dustdar)/AIC WS2024/25 Summary
Zur Navigation springen
Zur Suche springen
AIC — Advanced Internet Computing WS 2024/25[Bearbeiten | Quelltext bearbeiten]
Slide Set 1 — Week 1[Bearbeiten | Quelltext bearbeiten]
Software Evolution[Bearbeiten | Quelltext bearbeiten]
- Requirements cannot be fully gathered upfront or frozen
- Too many stake-holders
Open World Assumption[Bearbeiten | Quelltext bearbeiten]
- Ambient intelligence
- Loosely coupled
- Accessed on demand
Ecosystems[Bearbeiten | Quelltext bearbeiten]
Complex system with networked dependencies and intrinsic adaptive behavior
- Robustness & Resilience mechanisms
- Measures of health
- Built-in coherence
- Entropy-resistence
Layers of Paradigms[Bearbeiten | Quelltext bearbeiten]
Paradigm 1 Elasticity (Resilience)[Bearbeiten | Quelltext bearbeiten]
Elasticity > Scalability
Paradigm 2 Osmotic Computing[Bearbeiten | Quelltext bearbeiten]
Dynamic management of microservices across cloud and edge datacenters
Paradigm 3 Social Compute Units (SCUs)[Bearbeiten | Quelltext bearbeiten]
Service-oriented Computing (SoC)[Bearbeiten | Quelltext bearbeiten]
What is a service?[Bearbeiten | Quelltext bearbeiten]
- standardized interface
- self-contained with no dependencies to other services
- available
- context independent
Service Properties & State[Bearbeiten | Quelltext bearbeiten]
- Functional: operational characteristics, behaviour
- Non-functional: description targets service quality attributes, metering and cost, performance metrics
- Stateless: Services can be invoked repeatedly without having to maintain context
- Stateful: context preserved from one invocation to the next
Loose Coupling & Granularity[Bearbeiten | Quelltext bearbeiten]
- Loose Coupling
- Service granularity
- multiple services involved for a single process
- Coarse-grained complex services imply larger and richer data structures
Synchronicity & Well-definedness[Bearbeiten | Quelltext bearbeiten]
- Synchonicity: synchronous (rpc with arguments) vs. asynchronous (entire document)
- well definedness: service interaction must be well-defined. The web service description language WSDL allows applications to escribe to other applications the rule for interfacing and interacting
Slide Set 2 — Week 2 — Cloud Computing[Bearbeiten | Quelltext bearbeiten]
Motivation[Bearbeiten | Quelltext bearbeiten]
- Pay-per-use, lower maintenance cost, scaling, fault tolerant
- Use cases
- Demand varies with peak loads
- Deamn is unknownin advance
- Batch workloads
Cloud Computing Basics[Bearbeiten | Quelltext bearbeiten]
- NIST Definition
- On demand self service
- broad network access
- resource pooling, virtualization
- rapid elasticity, virtually unlimited capacity
- measured service
Three Cloud Service Models[Bearbeiten | Quelltext bearbeiten]
- IaaS Infrastructure as a Service
- Virtual Machines
- Amazon EC2, Amazon EBS
- PaaS Platform as a Sevice
- Computing Platform and solution stack / framework
- Google App Engine, Heroku
- SaaS Software as a Service
- CRM software
- Google Docs
IaaS PaaS SaaS[Bearbeiten | Quelltext bearbeiten]
- Speed vs. Customization (SaaS is not as flexible)
- Cost (PaaS can be cheaper)
- Vendor lock-in (SaaS and PaaS worse than IaaS)
Cloud Deployment Models[Bearbeiten | Quelltext bearbeiten]
- Public Cloud (AWS, Azure, low cost, no upfront cost, no maintenance)
- Private Cloud (Operated soley for one single org, self-reliance, flexibility, security, compliance)
- Community Cloud
- Hybrid Cloud (control of private infrastructure, flexibility take advantage of additional resources)
Virtualization[Bearbeiten | Quelltext bearbeiten]
- Abstract view on resources
- Platform (complete machine)
- Memory
- Storage
- Network
- Resource Pooling
- resources are shared between users (multitenancy)
- backend parallelization
- Consolidation
- Put many different classes of applications onto different VMs in the same data center
- Fault Tolerance
- Save VM state
Types of Virtualization[Bearbeiten | Quelltext bearbeiten]
- Hardware-level Virtualization
- Emulating full virtual computer hardware platforms (VMs)
- Hypervisors (Virtual Machine Monitors)
- Bare Metal (type 1)
- Lightweight virtualization layer directly on host hardware
- good performance and stability
- Xen, VMware ESXi
- Hosted (type 2)
- Runs on a host OS
- Forwards calls to the host OS (overhead)
- QEMU, VirtualBox
- Distinction not always clear, e.g. KVM
- Full virtualization (slow)
- Hardware-assisted virtualization (host knows virt is taking place, CPU requires virtualization extensions)
- Paravirtualization (software-assisted)
- Lightweight technique, near-native performance
- Operating System (OS)-level Virtualization (Containerization)
- OS kernel manages coexistence of multiple isolated user spaces
- Containers
- Share host OS and drivers
- near native performance
- not as secure as Vms
- more elastic than hypervisors
- Linux Containers LXC, FreeBSD jails, OpenVZ
- Docker is leading containerization technology
- initially implemented on top of LXC
- Cross-platform portability
- bundles FS, runtime, sys libraries
Existing commercial cloud offerings[Bearbeiten | Quelltext bearbeiten]
AWS[Bearbeiten | Quelltext bearbeiten]
- IaaS, PaaS, SaaS
- Elastic Compute Cloud EC2
- Simple Storage Service S3
- Simple Queue Service SQS
EC2[Bearbeiten | Quelltext bearbeiten]
- virtual machines with different capabilities (general purpose m4)
- 4 types of billing
- On-Demand (pay-per-use, flexible, no long term commitments)
- Reserved Instances (cheaper in exchange for long-term commitment)
- Spot Instances (Bid for spare EC2 capacity for big discounts)
- Dedicated Hosts (Dedicated physical server, useful for compliance targets)
Storage in AWS[Bearbeiten | Quelltext bearbeiten]
- Simple Storage Service S3 (object-based, persistent)
- Elastic Block Store EBS (like raw unformatted harddrive)
- Elastic File System
Simple Queue Service SQS[Bearbeiten | Quelltext bearbeiten]
- scaling, decouples application components so that
- you can scale transparently
- components can fail safely
- managed by Amazon (reliable, redundant)
- Two types
- standard queue, high throuhput, at least once delivery, best effort ordering
- FIFO, exactly-once delivery, limited thorughput
Heroku Platform[Bearbeiten | Quelltext bearbeiten]
- PaaS
- (I miss their free tier)
Cloud QoS[Bearbeiten | Quelltext bearbeiten]
- Measure of technical quality of a web or cloud service
- Performance
- Availability
- Failure rate
- Security
- Trust
- Compliance
Instance-Level Performance QoS Metrics[Bearbeiten | Quelltext bearbeiten]
- Round-Trip Time and Response Time
- Network Latency
- Processing Time
- Wrapping Time
- Execution time
Aggregated QoS Metrics[Bearbeiten | Quelltext bearbeiten]
- Throughput (maximum processing rate)
- Availability A = uptime / (uptime+downtime)
- Combined Availability is just multiplying 0.9 * 0.8 * 0.95
- Replicated Availability is 1-(1-a)^n, e.g. server with 0.8 availability with 3 replicas is 1- (1-0.8)^3. (probability that all 3 servers will fail is 0.2^3)
Service Level Agreements (SLAs)[Bearbeiten | Quelltext bearbeiten]
- For B2B interactions, normal users mostly get best effort delivery
- concrete Service Level Objectives (SLOs)
- metrics, concrete target values
- penalties for non-achievement, validity period
- responsible monitoring entity
Slide Set 3 — Week 3 — Edge Computing and Intelligence at the Edge (1)[Bearbeiten | Quelltext bearbeiten]
- Compute as physically close to the source as possible.
- AI on the Edge, Federated Learning
AI Accelerators[Bearbeiten | Quelltext bearbeiten]
- DNN Processors or sometimes called TPUs
- Optimized for AI workload, Matrix Multiplications, Multiply and accumulate
Graph Compiler Basics[Bearbeiten | Quelltext bearbeiten]
- Map high-level computational abstractions of DL Frameworks, i.e. layers to operations executable on an accelerator.
- Parallelize forward pass where possible
Service Level Objectives (SLOs)[Bearbeiten | Quelltext bearbeiten]
- SLAs comprised of one or more SLOs
- Quantifiable measure for platform providers that ensure QoS
- When SLos are violated, scaling horiziontally or vertically is needed ⇒ Leverage elasticity in the edge-cloud continuum.
SLOs at the edge[Bearbeiten | Quelltext bearbeiten]
- same as cloud SLOs but with additional challenges
- Additional Constraints and Considerations
- Computational Power
- AI Accelerator (yes / no / maybe)
- Battery Level
- Network Quality
- ⇒ “Cloud but with more pain”
Polaris Project High Level SLOs[Bearbeiten | Quelltext bearbeiten]
- composed metrics, aggregated of multiple lower-level metrics
Inference - Cloud Offloading[Bearbeiten | Quelltext bearbeiten]
- Cost efficient
- “Infinite Resources”
- Server-grade Hardware
- But Privacy Concerns and Latency issues
Inference - Edge Offloading[Bearbeiten | Quelltext bearbeiten]
- Privacy
- Proximity
- Server-grade Hardware
- But Horizontal Scaling, Limited Resources and Cost factor
Serverless Computing[Bearbeiten | Quelltext bearbeiten]
- Serverless ⇒ Function as a Service + Backend as a Service
- Implement applications exclusively with managed services
- Cloud-Native
- Pay-per-request
- Completely different paradigm to micro services (!)
- Stateless functions, provider auto scales replicas and routing requests
Backend as a Service[Bearbeiten | Quelltext bearbeiten]
- Managed Service, hosted and scaled by third party provider
- client programmers communicate through an API
- typically no knowledge on host hardware
- Instead clients are offered SLAs
- e.g. Message brokers, databases, user managment
Why serverless (edge)?[Bearbeiten | Quelltext bearbeiten]
- In theory permits a fully automated system for provisioning orchestration and deployment
- Provide same convenience of cloud-native development
Key Challenges[Bearbeiten | Quelltext bearbeiten]
- Volatility, Unstable Network, even less reliable hardware
- Prior Knowledge (beyond dark ages)
- Discovery, Hardware and Services
- Location
⇒ Scheduling is one of the primary concerns for Edge Computing
Slide Set 4 — Week 4 — Intelligence at the Edge (2)[Bearbeiten | Quelltext bearbeiten]
Deep Learning Quick Primer[Bearbeiten | Quelltext bearbeiten]
- Convolutional, composed of filters or kernels, extracts local features from spatial or temporal data
- Nonlinearity and Activation Functions
- add non-linearity throuch activation funcitons like ReLU
Model Compression[Bearbeiten | Quelltext bearbeiten]
- Network Quantization
- Reduce precision from 32 bit floats, for faster inference and lower memory footprint
- Network Pruning
- Remove components of a NN, e.g. channels in a Convolutional layer or neuron in a full connected layer
- Unstructured Pruning: Zero out weights, matrix remains the same size
- Structured Pruning: “physically” remove units from a network, changes architecture, less hardware reliant
- Knowledge Distillation
- Train smaller student network under supervision of a larger teacher network
- Deep Neural Networks are typically over-parameterized
- Pruning is simple but cruder
- Hard Labels (one-hot) or use output of teacher
Split Inference[Bearbeiten | Quelltext bearbeiten]
- AI accelerators are increasingly powerful, but cannot match performance of contemporary server-grade hardware
- Currently either completely onload or offload a task
- Offload: when performance is critical, but leaves valuavle client side resource idle
- Onload: when latency sensitive, hope comporessed modles meet your performance demands
- Split Inference
- Head and Tail Partitioning: Model is split
- Split Runtime: Distribute load between client and server
- Artificial Bottleneck Injection: Inject Autoencoder
Propaganda[Bearbeiten | Quelltext bearbeiten]
- Fallacies of Distributed (offloading) Systems
- The network is reliable
- latency is zero
- bandwidth is infinite
- network is secure
- transport cost is zero
- topology doesnt change
- there is one administrator
Neural Feature Compression[Bearbeiten | Quelltext bearbeiten]
- Lot’s of graphs and numbers in the slides…
- Focuses on Transfer Cost Reduced per Second TCR/s
Slide Set 5 — Week 5 — IoT Cloud Continuum[Bearbeiten | Quelltext bearbeiten]
Internet of Things[Bearbeiten | Quelltext bearbeiten]
- Sensing
- Communicaiton
- Processing
- Behavior
- Actuation
The traditional way[Bearbeiten | Quelltext bearbeiten]
- cloud-centric, because “infinite” compute
- however limits are reached. data transport strains network infra
- cloud too far for latency sensitive iot, privacy is threatened
New developments[Bearbeiten | Quelltext bearbeiten]
- device-to-cloud compute continuum emerges
- IoT devices are more powerful and better connected
- Edge Computing, moving computation near data source for enhanced privacy and reduced latency
The Computing Continuum[Bearbeiten | Quelltext bearbeiten]
Edge & Fog Computing[Bearbeiten | Quelltext bearbeiten]
- Computation inlcudes data processing, compression, decision making, etc.
- Emerging applications range from autonomous vehicles, augmented, reality to smart systems
- Low latency, decentralization, less signalling and comms overhead
Where is the edge?[Bearbeiten | Quelltext bearbeiten]
- Telcos: Edge of operator-controlled network (4G/5G base stations)
- Others: First hop of IoT device, or end-device itself
Fog vs. Edge[Bearbeiten | Quelltext bearbeiten]
- Fog computing has a wider scope
- Deeply hierarchical multi layer architecture
- fog computation anywhere among collaborating entities
- Edge computing on the other hand typically spans mostly up to the edge of the operator’s network
Infrastructure technologies[Bearbeiten | Quelltext bearbeiten]
Connectivity[Bearbeiten | Quelltext bearbeiten]
- Wireless Local/Personal Area networks
- WiFi Bluetooth, ZigBee
- High throughput
- Wide Area Networks
- 4G LTE, 5G
- Low Power Wide Area Networks: LoRa, SigFox, LTE-M, NB-IoT
Low Power Wide Area Networking[Bearbeiten | Quelltext bearbeiten]
- Event-drive or periodic data transmission
- Very large number of devices → networking hardware should be cheap
- Bluetooth Wi-Fi ZigBee not enough
- Very low power operation
- 3 main candidates
- LoRa (zero fee possible, unlicensed spectrum)
- SigFox
- LTE-M, NB-IoT
Multi-access Edge Computing[Bearbeiten | Quelltext bearbeiten]
- 5G Ultra Reliable and Low Latency Communication URLLC
IoT Software Stacks[Bearbeiten | Quelltext bearbeiten]
- Different Stacks for Device, Gateway and Cloud IoT service stacks
- Key Design Principles
- Loose Coupling
- Modularity
- Platform Independence
- Open Standards
- Well-defined APIs
Microservices-based design[Bearbeiten | Quelltext bearbeiten]
- Had that in previous sections already
Federated Learning[Bearbeiten | Quelltext bearbeiten]
- Training on data directly on remote devices, without revealing the data themselves
- Collect the outcomes, server aggregates these updates into a global model
- Challenges
- Volatility
- Asnychronity
- non independent and identically distributed data
- preventing privacy leaks
- Incentives to misbehave
- Integer Linear Promgramming (ILP) which devices to select