Matrices and Matrix Operations
Understand matrices as the core data structure of machine learning — from storing datasets to representing neural network layers.
Why This Matters for AI
When you feed an image into a neural network, the very first thing that happens is a matrix multiplication. When GPT generates the next word, it performs thousands of matrix multiplications. When you train any machine learning model, the parameters are stored in matrices and updated through matrix operations. Here's the thing: a neural network IS matrix math. A single layer of a neural network is literally just "multiply by a matrix, add a vector, apply a function." That's it. If you understand matrices, you understand the skeleton of every AI model ever built. Let's learn how they work.
The Intuition (No Math Yet)
A matrix is a grid of numbers arranged in rows and columns. Think of it as a spreadsheet. If you have a dataset of 1000 houses, each described by 5 features (size, bedrooms, bathrooms, age, price), you'd store it as a 1000×5 matrix — 1000 rows (one per house) and 5 columns (one per feature). Every dataset in machine learning is a matrix. But matrices are more than just data storage. A matrix can also represent a transformation — a function that takes a vector and turns it into a different vector. This is the key insight: a neural network layer is a matrix that transforms your data. When you multiply a matrix by a vector, you're transforming that vector. The matrix decides what the transformation does: rotate, stretch, project, or combine the input in new ways. Each row of the matrix defines one output value by creating a weighted combination of all the input values. This is literally what happens in a neural network: each layer has a weight matrix. Your input vector gets multiplied by this matrix to produce a new vector. Then the next layer multiplies that vector by another matrix. This chain of matrix multiplications is what gives neural networks their power.
The Formal Math
What is a matrix?
Matrix Addition
Matrix-Vector Multiplication
Matrix Multiplication
Transpose
Identity Matrix
Interactive Visualization
Interactive: Matrix Transformations
See how a 2×2 matrix transforms space. The red and blue arrows are the basis vectors. The purple square shows how the unit square transforms.
Presets:
Current Matrix:
[0.0, 1.0]
det = 1.00
Math → Code Bridge
Math → Code Bridge
See the math and its Python equivalent side by side. Same concept, two languages.
Creating Matrices
In NumPy, matrices are 2D arrays. The shape tells you (rows, columns).
import numpy as np
# A matrix is a 2D NumPy array
A = np.array([[1, 2, 3],
[4, 5, 6]])
print(A.shape) # (2, 3) — 2 rows, 3 columnsMatrix-Vector Multiplication (Neural Network Layer)
The @ operator does matrix multiplication. W@x + b is the fundamental equation of a neural network layer.
# This is literally one layer of a neural network!
W = np.array([[0.2, 0.8], # weight matrix (2x2)
[0.6, 0.4]])
x = np.array([1.0, 0.5]) # input vector
b = np.array([0.1, 0.1]) # bias vector
# Forward pass of one neural network layer
y = W @ x + b
print(y) # [0.7, 0.9]Matrix Multiplication
The @ operator is matrix multiplication. The * operator is element-wise multiplication. These are completely different operations!
A = np.array([[1, 2],
[3, 4]])
B = np.array([[5, 6],
[7, 8]])
# Matrix multiplication (not element-wise!)
C = A @ B # or np.matmul(A, B)
print(C)
# [[19 22]
# [43 50]]
# Element-wise multiplication is different:
D = A * B
print(D)
# [[ 5 12]
# [21 32]]Transpose
.T is the transpose. Used constantly in ML for things like computing gradients and in the attention mechanism formula Q·Kᵀ.
A = np.array([[1, 2, 3],
[4, 5, 6]])
print(A.shape) # (2, 3)
A_T = A.T # Transpose
print(A_T.shape) # (3, 2)
print(A_T)
# [[1 4]
# [2 5]
# [3 6]]Practice
Practice Problems
Apply what you learned to real AI/ML scenarios.
A simple neural network layer has weights W = [[0.5, -0.3], [0.8, 0.2]] and bias b = [0.1, -0.1]. The input is x = [1.0, 2.0]. Compute y = Wx + b.
Compute the output of this neural network layer. What is the resulting vector?
In a two-layer neural network, the first layer has a 3×4 weight matrix A and the second layer has a 2×3 weight matrix B. When we multiply B×A, we get the effective transformation of both layers combined.
Compute the matrix product AB. What are the dimensions of the result and what does it represent?
Summary
Summary Card
Key Formulas
Key Intuitions
- •A matrix is a grid of numbers — it stores datasets (rows = samples, columns = features).
- •A matrix is also a transformation — it turns input vectors into output vectors.
- •Matrix multiplication = composing transformations = chaining neural network layers.
- •The weight matrix W in a neural network layer decides what the layer learns.
AI/ML Connections
- •Every neural network layer is y = Wx + b — a matrix multiplication plus a bias.
- •Attention in Transformers: softmax(QKᵀ/√d)V — three matrix multiplications.
- •Training adjusts the weight matrices to minimize the loss function.
- •Matrix dimensions must match: this is why layer sizes must be compatible in neural networks.