ML & AI System Design for Staff Engineers
14 episodes covering ML infrastructure, AI-era systems, and Staff-level architectural thinking — feature stores, model serving, training platforms, vector search, LLM serving, RAG, and AI gateway design.
About This Course
This is Part 2 of our System Design series, focused on ML and AI infrastructure.
Staff-level interviews increasingly test ML infrastructure knowledge — not ML algorithms, but the systems that train, serve, and experiment with models at scale. This series teaches you how to design those systems.
We start with how ML system design interviews differ from traditional system design. Then we build up the core ML infrastructure stack: feature stores for training-serving consistency, stream processing for feature freshness, recommendation systems (two-tower retrieval, ANN search), vector search, and model serving platforms.
The AI/LLM section covers three cutting-edge topics: LLM serving (KV cache, batching, GPU scheduling), RAG at scale, and AI gateway / multi-model routing. We close with organizational scalability and a capstone end-to-end ML platform design exercise.
Original curriculum inspired by publicly available engineering blog posts, industry papers on ML infrastructure, and Staff+ engineering experience.
Prerequisites
- Completion of Part 1 (System Design Interview) or equivalent knowledge
- Working knowledge of distributed systems (caching, sharding, message queues)
- Basic understanding of ML concepts (training, inference, features, models)
- Familiarity with at least one backend language (Python, Java, Go, C++)
- No ML research experience required — this is about infrastructure, not algorithms
What You Will Learn
- Understand how ML system design interviews differ from traditional system design
- Design feature stores with training-serving consistency guarantees
- Design recommendation infrastructure: candidate generation, ranking, real-time personalization
- Design model serving platforms with canary rollout, A/B traffic splitting, and latency SLAs
- Design distributed training infrastructure: parameter servers, AllReduce, GPU scheduling
- Design vector search systems using ANN algorithms (FAISS, HNSW) at scale
- Design LLM serving platforms: KV cache management, request batching, speculative decoding
- Design RAG systems: document ingestion, chunking, embedding, retrieval optimization
- Design AI gateways: multi-provider routing, cost-aware scheduling, semantic caching
- Execute an end-to-end ML platform design from ingestion to serving
Terminology Mapping
How classic concepts map to the terminology used in this course.
| Classic | This Course (Meta) |
|---|---|
| Feature Store | Feature Platform / Featurizer |
| Model Registry | Model Store |
| Model Serving | Predictor / Navi |
| Vector Search | FAISS |
| Stream Processing | Flink-based pipelines |
| Experiment Platform | A/B Testing System |
| AI Gateway | Multi-model router |
Your Learning Path
Each module builds on the last. Take your time—the AI tutor is with you at every step.