TracksLinear Algebra for Machine LearningMatrices and Matrix Operations
LA Linear Algebra for Machine LearningFoundations: Vectors and Matrices

Matrices and Matrix Operations

Understand matrices as the core data structure of machine learning — from storing datasets to representing neural network layers.

60 XP ~22 min Lesson 2 / 10

Why This Matters for AI

When you feed an image into a neural network, the very first thing that happens is a matrix multiplication. When GPT generates the next word, it performs thousands of matrix multiplications. When you train any machine learning model, the parameters are stored in matrices and updated through matrix operations. Here's the thing: a neural network IS matrix math. A single layer of a neural network is literally just "multiply by a matrix, add a vector, apply a function." That's it. If you understand matrices, you understand the skeleton of every AI model ever built. Let's learn how they work.

The Intuition (No Math Yet)

A matrix is a grid of numbers arranged in rows and columns. Think of it as a spreadsheet. If you have a dataset of 1000 houses, each described by 5 features (size, bedrooms, bathrooms, age, price), you'd store it as a 1000×5 matrix — 1000 rows (one per house) and 5 columns (one per feature). Every dataset in machine learning is a matrix. But matrices are more than just data storage. A matrix can also represent a transformation — a function that takes a vector and turns it into a different vector. This is the key insight: a neural network layer is a matrix that transforms your data. When you multiply a matrix by a vector, you're transforming that vector. The matrix decides what the transformation does: rotate, stretch, project, or combine the input in new ways. Each row of the matrix defines one output value by creating a weighted combination of all the input values. This is literally what happens in a neural network: each layer has a weight matrix. Your input vector gets multiplied by this matrix to produce a new vector. Then the next layer multiplies that vector by another matrix. This chain of matrix multiplications is what gives neural networks their power.

The Formal Math

What is a matrix?

A matrix is a rectangular array of numbers arranged in rows and columns. An m×n matrix has m rows and n columns. We use capital letters for matrices.
A=[a11a12a1na21a22a2nam1am2amn]Rm×nA = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \in \mathbb{R}^{m \times n}

Matrix Addition

Add matrices of the same size by adding corresponding elements. This is used when combining outputs from different layers or adding bias terms in neural networks.
(A+B)ij=aij+bij(A + B)_{ij} = a_{ij} + b_{ij}

Matrix-Vector Multiplication

This is the most important operation in ML. Multiplying an m×n matrix by an n-dimensional vector gives an m-dimensional vector. Each output element is the dot product of a row of the matrix with the input vector. This is exactly what a neural network layer does.
Ax=[r1xr2xrmx]where ri is row i of AA\vec{x} = \begin{bmatrix} \vec{r}_1 \cdot \vec{x} \\ \vec{r}_2 \cdot \vec{x} \\ \vdots \\ \vec{r}_m \cdot \vec{x} \end{bmatrix} \quad \text{where } \vec{r}_i \text{ is row } i \text{ of } A

Matrix Multiplication

To multiply two matrices A (m×n) and B (n×p), the inner dimensions must match. The result is an m×p matrix where each element is a dot product of a row from A and a column from B. This is how we chain neural network layers.
(AB)ij=k=1naikbkj(AB)_{ij} = \sum_{k=1}^{n} a_{ik} \cdot b_{kj}

Transpose

The transpose of a matrix flips it over its diagonal — rows become columns and vice versa. The transpose appears everywhere in ML: in gradient calculations, in the normal equation for linear regression, and in attention mechanisms.
(AT)ij=Ajii.e., [123456]T=[135246](A^T)_{ij} = A_{ji} \quad \text{i.e., } \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix}^T = \begin{bmatrix} 1 & 3 & 5 \\ 2 & 4 & 6 \end{bmatrix}

Identity Matrix

The identity matrix I is the matrix equivalent of the number 1. Multiplying any matrix by I gives back the same matrix. It has 1s on the diagonal and 0s everywhere else.
In=[100010001],AI=IA=AI_n = \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix}, \quad AI = IA = A

Interactive Visualization

Interactive: Matrix Transformations

See how a 2×2 matrix transforms space. The red and blue arrows are the basis vectors. The purple square shows how the unit square transforms.

Presets:

1.0
0.0
0.0
1.0

Current Matrix:

[1.0, 0.0]
[0.0, 1.0]

det = 1.00

Math → Code Bridge

Math → Code Bridge

See the math and its Python equivalent side by side. Same concept, two languages.

Creating Matrices

In NumPy, matrices are 2D arrays. The shape tells you (rows, columns).

Math
A=[123456]R2×3A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \in \mathbb{R}^{2 \times 3}
Python / NumPy
import numpy as np

# A matrix is a 2D NumPy array
A = np.array([[1, 2, 3],
              [4, 5, 6]])

print(A.shape)  # (2, 3) — 2 rows, 3 columns

Matrix-Vector Multiplication (Neural Network Layer)

The @ operator does matrix multiplication. W@x + b is the fundamental equation of a neural network layer.

Math
y=Wx+b\vec{y} = W\vec{x} + \vec{b}
Python / NumPy
# This is literally one layer of a neural network!
W = np.array([[0.2, 0.8],     # weight matrix (2x2)
              [0.6, 0.4]])
x = np.array([1.0, 0.5])      # input vector
b = np.array([0.1, 0.1])      # bias vector

# Forward pass of one neural network layer
y = W @ x + b
print(y)  # [0.7, 0.9]

Matrix Multiplication

The @ operator is matrix multiplication. The * operator is element-wise multiplication. These are completely different operations!

Math
C=AB where Cij=kAikBkjC = AB \text{ where } C_{ij} = \sum_k A_{ik}B_{kj}
Python / NumPy
A = np.array([[1, 2],
              [3, 4]])
B = np.array([[5, 6],
              [7, 8]])

# Matrix multiplication (not element-wise!)
C = A @ B           # or np.matmul(A, B)
print(C)
# [[19 22]
#  [43 50]]

# Element-wise multiplication is different:
D = A * B
print(D)
# [[ 5 12]
#  [21 32]]

Transpose

.T is the transpose. Used constantly in ML for things like computing gradients and in the attention mechanism formula Q·Kᵀ.

Math
AT swaps rows and columnsA^T \text{ swaps rows and columns}
Python / NumPy
A = np.array([[1, 2, 3],
              [4, 5, 6]])
print(A.shape)  # (2, 3)

A_T = A.T       # Transpose
print(A_T.shape)  # (3, 2)
print(A_T)
# [[1 4]
#  [2 5]
#  [3 6]]

Practice

Practice Problems

Apply what you learned to real AI/ML scenarios.

1

A simple neural network layer has weights W = [[0.5, -0.3], [0.8, 0.2]] and bias b = [0.1, -0.1]. The input is x = [1.0, 2.0]. Compute y = Wx + b.

Compute the output of this neural network layer. What is the resulting vector?

2

In a two-layer neural network, the first layer has a 3×4 weight matrix A and the second layer has a 2×3 weight matrix B. When we multiply B×A, we get the effective transformation of both layers combined.

Compute the matrix product AB. What are the dimensions of the result and what does it represent?

Summary

Summary Card

Key Formulas

Matrix-Vector Mult.yi=jWijxjy_i = \sum_j W_{ij} x_j
Neural Network Layery=Wx+b\vec{y} = W\vec{x} + \vec{b}
Matrix Multiply(AB)ij=kAikBkj(AB)_{ij} = \sum_k A_{ik}B_{kj}
Transpose(AT)ij=Aji(A^T)_{ij} = A_{ji}

Key Intuitions

  • A matrix is a grid of numbers — it stores datasets (rows = samples, columns = features).
  • A matrix is also a transformation — it turns input vectors into output vectors.
  • Matrix multiplication = composing transformations = chaining neural network layers.
  • The weight matrix W in a neural network layer decides what the layer learns.

AI/ML Connections

  • Every neural network layer is y = Wx + b — a matrix multiplication plus a bias.
  • Attention in Transformers: softmax(QKᵀ/√d)V — three matrix multiplications.
  • Training adjusts the weight matrices to minimize the loss function.
  • Matrix dimensions must match: this is why layer sizes must be compatible in neural networks.

Discussion

Sign in to join the discussion.

Loading discussions...