infrastructure
3 cardsAWS Regions
AWS Region
A physical location in the world where AWS clusters its data centres.
- Each Region has ≥ 3 Availability Zones (the magic number — not 2)
- AZs are isolated, physically separate, geographically grouped
- Connected by redundant, ultra-low-latency networks
Pick a Region for: compliance (data residency), latency (close to users), service availability, price.
Availability Zones
Availability Zone (AZ)
One or more discrete data centres with redundant power, networking and connectivity, inside a Region.
- Each AZ has independent power, cooling and physical security
- AZs in a Region are interconnected via high-bandwidth, low-latency, redundant metro fibre
- Naming:
eu-west-2a,eu-west-2b, etc.
Spread workloads across ≥ 2 AZs in the same Region for high availability — single AZ = single point of failure.
Edge Locations
Edge Locations
Separate from Regions and AZs — these are content delivery endpoints close to end users.
- Used by services like CloudFront (CDN) and Route 53
- Cache content near users for faster delivery
- Far more numerous than Regions (400+ globally)
Hierarchy: Region contains AZs which contain data centres. Edge Locations sit outside this stack.
sagemaker family
6 cardsSageMaker Canvas
SageMaker Canvas
Generate ML predictions without writing any code. Visual, drag-and-drop interface for business analysts.
- Chat with popular LLMs
- Use Ready-to-use models
- Build custom models on your data — automatically
Think of it as the friendly front-end to ML for people who don't live in PyCharm.
SageMaker JumpStart
SageMaker JumpStart
An ML hub of foundation models, built-in algorithms and prebuilt solutions you can deploy in a few clicks.
- Access to popular FMs (Llama, Stable Diffusion, etc.)
- Pre-built end-to-end solutions for common use cases
- Fine-tune models on your own data
Canvas vs JumpStart: Canvas = no-code UI for predictions. JumpStart = model + solution hub for builders.
SageMaker Clarify
SageMaker Clarify
Two jobs, both critical for Responsible AI:
- Detects bias in your data and in model predictions across groups
- Explains predictions — shows how each input feature contributed (SHAP values)
| when | what it checks |
|---|---|
| pre-training | bias in your dataset |
| post-training | bias in model predictions |
| inference | explainability per prediction |
Exam keyword: "bias", "fairness", "explain predictions", "transparency" → Clarify.
Model Monitor
SageMaker Model Monitor
Continuously watches deployed models in production and alerts when something goes wrong.
- Detects data drift — when incoming data shifts from training data
- Detects concept drift — when the underlying relationships change
- Flags data quality issues and anomalies
- Alerts you to inaccurate predictions
Clarify catches bias upfront and explains predictions. Model Monitor catches problems after deployment. Both = Responsible AI.
Data Wrangler
SageMaker Data Wrangler
The fastest, easiest way to prep tabular and image data for ML — with little to no code.
- Import data from many sources (S3, Athena, Redshift, Snowflake)
- 300+ built-in transforms (impute, encode, normalise)
- Visualise distributions and outliers
- Export as a pipeline or feature set
Different from Ground Truth, which is for labeling data (annotating it), not transforming it.
Ground Truth
SageMaker Ground Truth
For labeling and annotating training data — turns raw data into labeled data for supervised learning.
- Use Mechanical Turk workers, vendors, or your own team
- Active learning: ML auto-labels easy items, humans handle hard ones
- Reduces labeling cost vs. doing everything by hand
- Also: Ground Truth Plus = fully managed labeling service
Exam keyword: "data labeling", "annotate data", "human reviewers add labels" → Ground Truth.
responsible AI
2 cardsResponsible AI pillars
The 6 pillars (AWS Responsible AI)
| pillar | what it means |
|---|---|
| Fairness | equitable outcomes across groups |
| Explainability | understand how predictions are made |
| Privacy & security | protect data & respect rights |
| Safety | algorithms work as intended, no harm |
| Controllability | humans can oversee & correct AI |
| Veracity & robustness | accurate & reliable outputs |
| Governance | compliance, accountability |
| Transparency | openness about capabilities & limits |
AWS tools that map here: Clarify (fairness + explainability), Model Monitor (safety + robustness), A2I (controllability).
Amazon A2I
Amazon Augmented AI (A2I)
Implements human review for ML predictions — useful when accuracy matters and the model is uncertain.
- Built-in workflows for Rekognition, Textract, custom models
- Trigger human review when prediction confidence is low
- Or sample a % of predictions for QA review
- Reviewers can be your team, vendors, or Mechanical Turk
This is the AWS service for the "human-in-the-loop" pillar of Responsible AI. Critical for high-stakes use cases like medical, legal, content moderation.
lifecycle terms
4 cardsThe ML process
The ML process sequence
The fundamental 4-step sequence the exam will test:
- Data collection — gather from all sources
- Data preprocessing — clean, normalise, split train/val/test
- Model training — feed prepared data to the algorithm
- Model evaluation — assess on the held-out test set
In the wider ML lifecycle (MLOps), this extends to: framing → data → train → deploy → monitor → iterate.
Training
Training
The phase where the model learns — its internal parameters (weights) are being updated.
- Uses the training set (~60–80% of data)
- Algorithm minimises an error function
- Model "sees" examples and adjusts
- The output is a trained model
Validation vs Testing
Validation vs Testing
| validation | testing | |
|---|---|---|
| when | during training | after training |
| data | val set (10–20%) | test set (10–20%) |
| purpose | tune hyperparameters, prevent overfit | unbiased final estimate of performance |
| repeat? | many times | ideally once |
Mnemonic: validation = the dress rehearsal (you adjust). Testing = the actual show (you don't tweak after).
Neither involves serving predictions to real users — that's inference.
Inference
Inference
The deployed model uses its frozen learned parameters to make a prediction on new, real-world input.
- This is what users actually experience
- No learning happening — parameters don't change
- Two modes: real-time (live API, low latency) and batch (process many records at once)
In SageMaker, you deploy a model to an endpoint for real-time inference, or run a batch transform job for bulk.
foundation models
3 cardsFMs & self-supervised
Foundation Models use self-supervised learning
The defining technique for FMs. Model gets vast amounts of unlabeled data, then generates labels from the data itself.
- No humans labeling anything upfront
- Example: LLM predicts the next word — the "label" is just the next word that already exists in the text
- Used to pre-train, then fine-tune for downstream tasks
⨯ Unsupervised = unlabeled data, no labels at all, find patterns
✓ Self-supervised = unlabeled data, model CREATES labels from it
Question keywords: "foundation model" + "generates labels" + "from raw input" → always self-supervised.
Learning paradigms
The 4 learning paradigms
| type | data | used for |
|---|---|---|
| Supervised | labeled in/out pairs | classification, regression |
| Unsupervised | unlabeled, no labels created | clustering, anomaly |
| Self-supervised | unlabeled, model creates labels | foundation models, LLMs |
| Reinforcement | reward signal | games, robotics, RLHF |
Plus semi-supervised = small labeled + large unlabeled, uses pseudo-labeling.
Bedrock
Amazon Bedrock
The fully managed service for accessing foundation models via API from multiple providers.
- Access to Anthropic Claude, Meta Llama, Mistral, Cohere, Amazon Titan, etc.
- Serverless — no infrastructure to manage
- Fine-tune FMs on your own data privately
- Knowledge Bases for RAG, Agents for tool-use
- Guardrails for safety filtering
Bedrock = serverless API for FMs, fully managed
JumpStart = deploy FMs to YOUR SageMaker infrastructure, more control
fit & challenges
2 cardsBias × Variance
Bias & Variance, locked in
| underfit | overfit | |
|---|---|---|
| bias | HIGH | low |
| variance | low | HIGH |
| train perf | poor | great |
| test perf | poor | poor |
Memorable framing:
- Underfit = high bias, model is too simple, biased toward being basic
- Overfit = high variance, model memorised noise, output swings wildly with new data
Goldilocks zone in the middle. Reduce overfit with: more data, regularisation, early stopping, dropout, ensembling.
ML's #1 challenge
The biggest ML challenge: data
Collecting and preparing high-quality data is THE primary challenge in real-world ML implementation.
- Algorithms are commodity (open-source, well-documented)
- Compute is rentable (AWS, GPUs on demand)
- Data quality, quantity, labeling, bias — all bottlenecks
- "Garbage in, garbage out" — your model is only as good as its data
Adjacent challenges: model interpretability, deployment costs, monitoring drift, ethical concerns, regulatory compliance.