1. Vectors
A vector is an ordered list of numbers representing a point in space or a direction.
Types of Vectors
Type | Notation | Shape | Example |
---|---|---|---|
Column Vector | v | (n, 1) | [[1], [2], [3]] |
Row Vector | vᵀ | (1, n) | [[1, 2, 3]] |
Zero Vector | 0 | (n, 1) | [[0], [0], [0]] |
Unit Vector | e | (n, 1) | [[1], [0], [0]] |
Vector Operations
u + v = [u₁+v₁, u₂+v₂, ..., uₙ+vₙ]
c·v = [c·v₁, c·v₂, ..., c·vₙ]
u·v = u₁v₁ + u₂v₂ + ... + uₙvₙ = Σ uᵢvᵢ
Example: Dot Product
Vector Magnitude (Length)
Unit Vector (Normalization)
Key Formulas
2. Matrices
A 2D array of numbers arranged in rows and columns.
Special Matrices
Matrix Type | Definition | Example |
---|---|---|
Square Matrix | m = n | 3×3, 4×4 |
Diagonal Matrix | Aᵢⱼ = 0 if i≠j | [[2,0,0], [0,3,0], [0,0,4]] |
Identity Matrix (I) | Diagonal with 1's | [[1,0,0], [0,1,0], [0,0,1]] |
Zero Matrix | All zeros | [[0,0], [0,0]] |
Symmetric Matrix | A = Aᵀ | [[1,2,3], [2,4,5], [3,5,6]] |
Upper Triangular | Aᵢⱼ=0 if i>j | [[1,2,3], [0,4,5], [0,0,6]] |
Lower Triangular | Aᵢⱼ=0 if i<j | [[1,0,0], [2,3,0], [4,5,6]] |
3. Matrix Operations
Transpose
Flip rows and columns. If A is m×n, then Aᵀ is n×m
Original Matrix A
Transposed Aᵀ
Properties
Matrix Multiplication
Critical for Neural Networks!
Example
Properties
Element-wise (Hadamard) Product
Denoted by ⊙ or ∗
Used in: dropout, attention mechanisms
Matrix-Vector Multiplication
4. Special Matrices
Identity Matrix (I)
Inverse Matrix (A⁻¹)
Example
Properties
Inverse exists when:
Orthogonal Matrix (Q)
Properties
Example: Rotation Matrix
Positive Definite Matrix
Properties
5. Matrix Properties
Determinant (det(A) or |A|)
For 2×2 matrix
For 3×3 matrix
Properties
Geometric Meaning
Trace (tr(A))
Sum of diagonal elements
Properties
Rank
Maximum number of linearly independent rows (or columns)
Properties
Interpretation
6. Eigenvalues & Eigenvectors
Definition
For a square matrix A:
How to Find Eigenvalues
Example
Properties
Diagonalization
If A has n linearly independent eigenvectors:
Why it matters
7. Matrix Decompositions
LU Decomposition
Used for: Solving linear systems efficiently
QR Decomposition
Used for: Least squares, eigenvalue algorithms
Eigendecomposition
Requirements: A must be square and have n independent eigenvectors
Used for: PCA, understanding transformations
Singular Value Decomposition (SVD)
⚡ SUPER IMPORTANT FOR ML!
Properties
Used in
Cholesky Decomposition
For positive definite matrices:
Used for: Solving linear systems, sampling from multivariate Gaussians
8. Norms
Vector Norms
Measure of vector "size" or "length"
Norm | Formula | Name | Use Case |
---|---|---|---|
L0 | # of non-zero elements | L0-norm | Sparsity |
L1 | Σ|vᵢ| | Manhattan | Sparsity, robustness |
L2 | √(Σvᵢ²) | Euclidean | Most common |
L∞ | max|vᵢ| | Max norm | Worst-case |
Example
Matrix Norms
Frobenius Norm
Spectral Norm (L2)
Regularization in ML
L1 Regularization (Lasso)
Encourages sparsity (many weights = 0)
L2 Regularization (Ridge)
Encourages small weights (weight decay)
9. Linear Transformations
A function T that satisfies:
Every linear transformation can be represented as a matrix!
Common Transformations
Scaling
Scales x by sₓ, y by sᵧ
Rotation (2D)
Rotates counterclockwise
Shear
Shears horizontally
Reflection
Projection
Project vector v onto vector u:
10. Key Concepts for ML/DL
1. Linear Systems (Ax = b)
Solutions
2. Least Squares
Minimize ‖Ax - b‖₂²
This is how linear regression works!
3. Moore-Penrose Pseudo-Inverse (A⁺)
For non-square or singular matrices:
Used in
4. Principal Component Analysis (PCA)
Steps
Why it works:
5. Covariance Matrix
Properties
6. Gradient of Matrix Operations
Critical for backpropagation!
Function | Gradient |
---|---|
f(x) = Ax | ∇f = Aᵀ |
f(x) = xᵀAx | ∇f = (A + Aᵀ)x |
f(x) = ‖Ax - b‖² | ∇f = 2Aᵀ(Ax - b) |
7. Matrix Calculus Identities
8. Batch Matrix Operations in Neural Networks
Gradients
9. Orthogonality in Neural Networks
Why orthogonal matrices are nice
Orthogonal Initialization
10. Low-Rank Approximation
Using SVD for compression:
Used in
Quick Reference for Neural Networks
Forward Pass
Backward Pass (Chain Rule + Linear Algebra)
Common Matrix Dimensions in DL
Fully Connected Layer
Convolutional Layer (simplified)
Attention
Most Important for Interviews
Top 10 Must-Know Concepts
Practice Problems
Show Answers
Python Implementation
Core Formulas - TL;DR
Summary
You now have everything you need for:
Focus on: