Calculus for Machine Learning - Complete Cheat Sheet

1. Derivatives (Differentiation)

The derivative measures the rate of change of a function. It's the foundation of backpropagation in neural networks.

Basic Derivative Rules

Function	Derivative
c (constant)	0
x	1
x^n	n·x^(n-1)
e^x	e^x
a^x	a^x · ln(a)
ln(x)	1/x
log_a(x)	1/(x·ln(a))
sin(x)	cos(x)
cos(x)	-sin(x)
tan(x)	sec²(x)
cot(x)	-csc²(x)
sec(x)	sec(x)·tan(x)
csc(x)	-csc(x)·cot(x)

Differentiation Rules

Sum Rule: d/dx[f + g] = f' + g'

Difference Rule: d/dx[f - g] = f' - g'

Product Rule: d/dx[f·g] = f'·g + f·g'

Quotient Rule: d/dx[f/g] = (f'·g - f·g')/g²

Chain Rule: d/dx[f(g(x))] = f'(g(x))·g'(x)

Power Rule: d/dx[x^n] = n·x^(n-1)

Examples

Example 1: d/dx[3x² + 5x - 7] = 6x + 5

Example 2: d/dx[x³·sin(x)] = 3x²·sin(x) + x³·cos(x)

Example 3: d/dx[sin(x²)] = cos(x²)·2x

Example 4: d/dx[(x² + 1)/(x - 1)] = [(2x)(x-1) - (x²+1)(1)]/(x-1)²

2. Integrals (Integration)

Integration is the reverse of differentiation. It finds the area under a curve and is used in probability distributions and optimization.

Basic Integral Rules

Function	Integral
k (constant)	kx + C
x^n (n ≠ -1)	x^(n+1)/(n+1) + C
1/x	ln\|x\| + C
e^x	e^x + C
a^x	a^x/ln(a) + C
sin(x)	-cos(x) + C
cos(x)	sin(x) + C
sec²(x)	tan(x) + C
csc²(x)	-cot(x) + C
sec(x)·tan(x)	sec(x) + C
1/√(1-x²)	arcsin(x) + C
1/(1+x²)	arctan(x) + C

Integration Techniques

1. Substitution: ∫ f(g(x))·g'(x)dx = ∫ f(u)du where u = g(x)

2. Integration by Parts: ∫ u·dv = uv - ∫ v·du

3. Partial Fractions: Break complex fractions into simpler ones

4. Trigonometric Substitution: Use for √(a²-x²), √(a²+x²), √(x²-a²)

Examples

Example 1: ∫ 3x² dx = x³ + C

Example 2: ∫ sin(2x) dx = -cos(2x)/2 + C

Example 3: ∫ x·e^x dx = xe^x - e^x + C

Example 4: ∫₀¹ x² dx = [x³/3]₀¹ = 1/3

3. Limits

Limits describe the behavior of functions as they approach a particular point. Essential for understanding derivatives and continuity.

Basic Limits

lim(x→a) c = c

lim(x→a) x = a

lim(x→a) x^n = a^n

lim(x→∞) 1/x = 0

lim(x→0) sin(x)/x = 1

lim(x→∞) (1 + 1/x)^x = e

L'Hôpital's Rule

For indeterminate forms (0/0 or ∞/∞):

lim(x→a) f(x)/g(x) = lim(x→a) f'(x)/g'(x)

Example:

lim(x→0) sin(x)/x = lim(x→0) cos(x)/1 = 1

4. Chain Rule

⚡ CRITICAL FOR DEEP LEARNING

The chain rule is the mathematical foundation of backpropagation in neural networks. It allows us to compute gradients through composed functions.

Single Variable

d/dx[f(g(x))] = f'(g(x))·g'(x)

Multivariable (Partial Derivatives)

∂z/∂x = (∂z/∂u)·(∂u/∂x) + (∂z/∂v)·(∂v/∂x)

Neural Networks Example

Loss = L(ŷ, y) where ŷ = f(Wx + b)

∂L/∂W = (∂L/∂ŷ)·(∂ŷ/∂W) ← This is backpropagation!

5. Partial Derivatives

For functions with multiple variables, partial derivatives measure the rate of change with respect to one variable while keeping others constant.

∂f/∂x: Derivative with respect to x (treat y, z, ... as constants)

∂f/∂y: Derivative with respect to y (treat x, z, ... as constants)

Example

Given: f(x, y) = x²y + 3xy² - 5

Solution:

∂f/∂x = 2xy + 3y²

∂f/∂y = x² + 6xy

6. Gradient

⚡ CORE OF MACHINE LEARNING

The gradient is a vector containing all partial derivatives. It points in the direction of steepest ascent and is used in gradient descent optimization.

Definition

∇f = [∂f/∂x₁, ∂f/∂x₂, ..., ∂f/∂xₙ]

Example

Given: f(x, y) = x² + y²

Gradient: ∇f = [2x, 2y]

← Points in direction of steepest ascent

Gradient Descent (How ML Models Learn)

θ_new = θ_old - α·∇L(θ)

where α = learning rate, L = loss function

7. Common Derivatives for ML/DL

Activation Functions

Function	Derivative
Sigmoid: σ(x) = 1/(1+e^-x)	σ(x)·(1-σ(x))
Tanh: tanh(x)	1 - tanh²(x)
ReLU: max(0, x)	0 if x<0, 1 if x>0
Leaky ReLU	α if x<0, 1 if x>0

Loss Functions

Loss	Derivative
MSE: (ŷ-y)²	2(ŷ-y)
Cross-Entropy: -y·log(ŷ)	-y/ŷ

8. Taylor Series

Taylor series approximates any smooth function as an infinite sum of polynomials. Used in numerical methods and approximations.

f(x) = f(a) + f'(a)(x-a) + f''(a)(x-a)²/2! + f'''(a)(x-a)³/3! + ...

Common Series (around x=0)

e^x = 1 + x + x²/2! + x³/3! + x⁴/4! + ...

sin(x) = x - x³/3! + x⁵/5! - x⁷/7! + ...

cos(x) = 1 - x²/2! + x⁴/4! - x⁶/6! + ...

ln(1+x) = x - x²/2 + x³/3 - x⁴/4 + ... (for |x| < 1)

9. Integration Tricks

U-Substitution

Problem: ∫ 2x·cos(x²) dx

Let u = x², du = 2x dx

= ∫ cos(u) du

= sin(u) + C

= sin(x²) + C

Integration by Parts (ILATE)

Order: Inverse, Log, Algebraic, Trig, Exponential

Problem: ∫ x·ln(x) dx

u = ln(x), dv = x dx

du = 1/x dx, v = x²/2

= (x²/2)·ln(x) - ∫ x²/2·1/x dx

= (x²/2)·ln(x) - x²/4 + C

10. Definite Integrals

Fundamental Theorem of Calculus

∫ₐᵇ f(x) dx = F(b) - F(a) where F'(x) = f(x)

Properties

∫ₐᵇ f(x) dx = -∫ᵇₐ f(x) dx

∫ₐᵇ [f(x) + g(x)] dx = ∫ₐᵇ f(x) dx + ∫ₐᵇ g(x) dx

∫ₐᵇ c·f(x) dx = c·∫ₐᵇ f(x) dx

11. Multivariable Calculus

Gradient

∇f = [∂f/∂x, ∂f/∂y, ∂f/∂z]

Direction of steepest ascent

Directional Derivative

D_u f = ∇f · u

Rate of change in direction u

Hessian Matrix

H = [∂²f/∂x²  ∂²f/∂x∂y]
[∂²f/∂y∂x ∂²f/∂y²]

Second derivatives

12. Quick Reference for Neural Networks

Backpropagation Chain Rule

∂L/∂w = ∂L/∂ŷ · ∂ŷ/∂z · ∂z/∂w

Gradient of Common Operations

Matrix Multiplication: y = Wx

∂y/∂W = x^T

∂y/∂x = W^T

Element-wise: y = x ⊙ w

∂y/∂x = w

∂y/∂w = x

Quick Tips for Exams

✓ Always check for 0/0, ∞/∞ → Use L'Hôpital's Rule

✓ Integration: Try u-substitution first

✓ Product of functions: Use Product Rule or Integration by Parts

✓ Chain functions: Always use Chain Rule

✓ Don't forget +C in indefinite integrals

✓ Check your limits in definite integrals

Practice Problems

1. d/dx[x³·e^(2x)] = ?

2. ∫ x·cos(x) dx = ?

3. lim(x→0) (e^x - 1)/x = ?

4. ∇(x²y + y²z) = ?

Show Answers

1. 3x²·e^(2x) + x³·2e^(2x) = e^(2x)(3x² + 2x³)

2. x·sin(x) + cos(x) + C

3. 1 [Use L'Hôpital's Rule]

4. [2xy, x² + 2yz, y²]

Pro Tips

For ML/DL

Focus on Chain Rule, Partial Derivatives, and Gradient - they're the foundation of backpropagation.

For Exams

Memorize basic derivatives and integrals tables. Practice identifying which rule to use.

Practice

Do problems daily - calculus needs muscle memory. Start simple, then increase complexity.