Linear Algebra for AI

Master the language of AI: vectors, matrices, transformations, and eigenvalues -- with Python code to ground every concept. From NumPy basics to LoRA fine-tuning and mechanistic interpretability.

16 modules16 available~16 hours total

About This Course

Linear algebra is the mathematical backbone of modern AI and machine learning. This course teaches you to think in vectors and matrices, with every concept grounded in executable Python code.

You won't just memorize formulas -- you'll build intuition for what linear transformations actually do, visualize high-dimensional spaces, and understand why eigenvalues matter for everything from Google's PageRank to neural networks.

The course bridges theory and production: you'll see how SVD powers LoRA fine-tuning, how embeddings live in vector spaces, how attention is pure linear algebra, and how quantization trades precision for speed. Every module includes a Python mini-lab so you can manipulate the math and see the result change.

The AI tutor can verify its own math through code execution, making this subject especially well-suited for reliable, interactive learning.

Inspired by Gilbert Strang's MIT 18.06, 3Blue1Brown's Essence of Linear Algebra, and modern AI research (LoRA, mechanistic interpretability, quantization).

Prerequisites

  • Basic algebra (solving equations, working with variables)
  • Familiarity with Python basics helpful but not required
  • No prior linear algebra experience needed

What You Will Learn

  • Understand vectors, matrices, and their geometric interpretations
  • Visualize transformations with matplotlib and build geometric intuition
  • Solve systems of linear equations using Gaussian elimination
  • Grasp linear independence, span, and basis -- the core of vector spaces
  • Compute and interpret determinants, inverses, eigenvalues, and eigenvectors
  • Apply PCA for dimensionality reduction on real datasets
  • Use SVD for image compression and low-rank approximation
  • Understand how word embeddings and LLM embeddings work geometrically
  • Know how ANN algorithms (HNSW, IVF) power vector search at scale
  • Read neural network architectures as chains of matrix operations
  • Understand the attention mechanism (QKV) as pure linear algebra
  • Explain how LoRA compresses fine-tuning via low-rank factorization
  • Grasp how mechanistic interpretability uses linear directions to decode model behavior
  • Understand quantization as an affine transformation trading precision for speed

Your Learning Path

Each module builds on the last. Take your time—the AI tutor is with you at every step.

1

Vectors: The Language of DataFrom arrows to arrays -- how AI represents everything as vectors

Vectors are the atoms of linear algebra and the native data format of AI. This module builds your intuition from geometric arrows to NumPy arrays. You'll learn vector addition, scalar multiplication, and the dot product -- then discover that dot products measure similarity, the idea behind cosine similarity in recommendation systems and LLM embeddings. Visualize everything with matplotlib so the geometry clicks before the formulas. Mini-lab: Compute cosine similarity between sentence embeddings to find the most similar document in a small collection.

60 minVideo lecture
2

Matrices as TransformationsEvery matrix is a function that reshapes space

Matrices aren't just grids of numbers -- they are linear transformations. This module teaches matrix addition, multiplication, and transposition, but always through the lens of geometry: rotation, scaling, shearing, and projection. You'll visualize each operation with matplotlib, watching the unit square deform in real time. This geometric framing is essential because every layer of a neural network is a matrix transformation. Mini-lab: Build an interactive 2D transformation visualizer -- pick a matrix, watch the grid warp.

60 minVideo lecture
3

Systems of Linear EquationsGaussian elimination and the art of solving Ax = b

Most of applied math reduces to solving Ax = b. This module teaches you Gaussian elimination and row reduction -- the algorithmic backbone of linear algebra. You'll see how systems of equations correspond to intersecting hyperplanes, understand when solutions exist (and when they don't), and implement row reduction in Python. Mini-lab: Solve a system of 5 equations by hand via row reduction, then verify with np.linalg.solve.

60 minVideo lecture
4

Vector Spaces and SubspacesLinear independence, span, basis, and dimension

This module introduces the abstract structure that unifies all of linear algebra: vector spaces. You'll learn what it means for vectors to be linearly independent, what 'span' really means, and why choosing the right basis simplifies everything. These concepts are the vocabulary you need to understand PCA, SVD, and embedding spaces. Mini-lab: Find a basis for the column space of a matrix and verify that non-basis vectors can be written as linear combinations.

60 minVideo lecture
5

Determinants and InversesThe scaling factor of transformations and when you can undo them

The determinant tells you how a matrix transformation scales area (or volume). If it's zero, the transformation crushes space into a lower dimension -- and the matrix has no inverse. This module builds geometric intuition for determinants, then connects to matrix invertibility. You'll compute determinants and inverses by hand and with NumPy. Mini-lab: Visualize how the determinant changes as you continuously deform a 2x2 matrix from invertible to singular.

45 minVideo lecture
6

Linear TransformationsKernel, image, rank-nullity, and change of basis

Now that you know matrices are transformations, this module digs deeper: what gets sent to zero (the kernel), what the transformation can produce (the image), and the fundamental rank-nullity theorem that connects them. You'll also learn change of basis -- the technique behind diagonalization, PCA, and every 'feature extraction' pipeline in ML. Mini-lab: Compute the kernel and image of a transformation, verify the rank-nullity theorem, and perform a change of basis.

60 minVideo lecture
7

Eigenvalues and EigenvectorsThe directions that survive a transformation unchanged

Eigenvectors are the directions a matrix only stretches (never rotates). Eigenvalues tell you how much. This module builds geometric intuition first -- watching vectors get transformed and identifying the special ones that stay on their line -- then covers the characteristic polynomial and diagonalization. The payoff: eigenvalues are the key to PCA, Google's PageRank, and stability analysis. Mini-lab: Animate a 2x2 transformation showing all vectors being rotated except the eigenvectors, which only scale.

60 minVideo lecture
8

PCA: Dimensionality ReductionFinding the directions of maximum variance in your data

Principal Component Analysis is eigenvalues applied to data. You compute the covariance matrix of your dataset, find its eigenvectors (the principal components), and project onto the top-k directions of maximum variance. This module is the bridge from abstract eigenvalue theory to practical ML: you'll reduce a real dataset from high dimensions to 2D and visualize the clusters that emerge. Mini-lab: Run PCA on the Iris dataset -- reduce 4 features to 2, plot the result, and see the species separate.

60 minVideo lecture
9

Singular Value DecompositionThe Swiss army knife of matrix decompositions

SVD decomposes any matrix (not just square ones) into three factors: U * Sigma * V^T. The singular values in Sigma tell you how much 'information' each component carries. By keeping only the top-k singular values, you get the best rank-k approximation -- this is the mathematical foundation of image compression, latent semantic analysis, and (crucially) LoRA fine-tuning. This module builds from the geometry of SVD to hands-on applications. Mini-lab: Compress a photo by keeping only the top-k singular values. Watch the image degrade as you reduce k from 100 to 10 to 1.

60 minVideo lecture
10

Embeddings: From Words to VectorsHow AI maps discrete objects into continuous vector spaces

An embedding is a learned linear map from a discrete set (words, users, products) into a continuous vector space. This module covers the geometry of embeddings: why king - man + woman = queen works, how sentence embeddings capture semantic meaning, and what it means for LLM token embeddings to live in a 12,288-dimensional space. You'll also confront the curse of dimensionality -- why intuition breaks in high dimensions. Mini-lab: Build a semantic search engine. Embed a collection of sentences, compute cosine similarities, and retrieve the most relevant document for a query.

60 minVideo lecture
11

Vector Search at ScaleANN algorithms, HNSW, and the rise of vector databases

Once you have millions of embeddings, brute-force cosine similarity is too slow. This module covers approximate nearest neighbor (ANN) algorithms that trade a tiny bit of accuracy for massive speedups. You'll learn locality-sensitive hashing, IVF (inverted file index), HNSW (hierarchical navigable small world graphs), and product quantization. Then see how vector databases (pgvector, Qdrant, Pinecone) wrap these algorithms into production infrastructure for RAG and recommendation systems. Mini-lab: Index 10,000 embeddings with FAISS, compare brute-force vs. IVF vs. HNSW on speed and recall.

60 minVideo lecture
12

Neural Networks as Linear AlgebraDense layers, forward passes, and tensors beyond 2D

A dense neural network layer is just y = Wx + b -- a matrix multiplication plus a bias vector, followed by a nonlinear activation. This module strips away the deep learning mystique and shows you the linear algebra at the core. You'll manually implement a forward pass using only NumPy, then extend to tensors (3D+ arrays) that represent batches of images and sequences. Finally, you'll see that PyTorch's torch.Tensor is just NumPy with autograd. Mini-lab: Manually code a 2-layer neural network forward pass using only NumPy matrix operations (no frameworks), then verify against PyTorch.

60 minVideo lecture
13

The Attention MechanismHow transformers use matrix operations to focus on what matters

Attention is the innovation behind transformers, and it's pure linear algebra. This module derives scaled dot-product attention step by step: project inputs into Query, Key, and Value matrices, compute attention scores via QK^T/sqrt(d), apply softmax, then multiply by V. You'll see that multi-head attention is just running several smaller attention operations in parallel -- a block-diagonal matrix structure. No black boxes. Mini-lab: Implement single-head and multi-head attention from scratch in NumPy. Feed in a sentence and visualize the attention weight matrix as a heatmap.

60 minVideo lecture
14

LoRA: Low-Rank AdaptationFine-tuning billion-parameter models with tiny matrices

LoRA is the most practical application of matrix decomposition in modern AI. Instead of updating a giant weight matrix W (d x k parameters), you freeze W and train two small matrices B (d x r) and A (r x k) where r << d, so the update is Delta-W = B*A. This module connects SVD theory to practice: you'll understand why weight updates tend to be low-rank, implement a LoRA layer from scratch, and see how this enables fine-tuning LLMs on consumer hardware. Mini-lab: Implement a LoRA adapter layer. Compare parameter counts: full fine-tuning vs. LoRA with rank 4, 8, 16. Show that rank-8 captures 95%+ of the update.

60 minVideo lecture
15

Mechanistic InterpretabilityUsing linear algebra to reverse-engineer what neural networks learn

Mechanistic interpretability is the 'forensics' of AI: researchers use linear algebra to decode what individual neurons and layers actually represent. The key insight is the Linear Representation Hypothesis -- high-level concepts like 'truthfulness,' 'sentiment,' or 'programming language' are encoded as linear directions (vectors) in activation space. You can find these directions, measure them, and even add or subtract them to steer model behavior. This module covers probing classifiers, activation steering, and concept vectors. Mini-lab: Extract activations from a small language model, find the 'sentiment direction' using PCA on positive vs. negative examples, and show that adding this direction flips the model's output sentiment.

60 minVideo lecture
16

Quantization: Precision vs. SpeedHow shrinking numbers from 32 bits to 4 bits keeps AI fast and cheap

As models grow to hundreds of billions of parameters, storing every weight in 32-bit float becomes impractical. Quantization maps continuous values to a smaller discrete set using affine transformations: Q = round(W/S + Z). This module covers the linear algebra of quantization: why it's an affine transformation, how it distorts the vector space, why 'outlier features' break naive approaches, and how techniques like GPTQ, AWQ, and SmoothQuant handle this. You'll implement basic quantization and measure the accuracy/speed tradeoff. Mini-lab: Quantize a weight matrix to INT8 and INT4. Measure the reconstruction error (Frobenius norm of W - W_quantized) and see where outlier features cause problems.

60 minVideo lecture