Back to Library
Calculus
FOUNDATIONS

Calculus for Machine Learning

Complete reference guide covering derivatives, integrals, limits, gradients, and backpropagation for neural networks.

1. Derivatives (Differentiation)

The derivative measures the rate of change of a function. It's the foundation of backpropagation in neural networks.

Basic Derivative Rules

FunctionDerivative
c (constant)0
x1
x^nn·x^(n-1)
e^xe^x
a^xa^x · ln(a)
ln(x)1/x
log_a(x)1/(x·ln(a))
sin(x)cos(x)
cos(x)-sin(x)
tan(x)sec²(x)
cot(x)-csc²(x)
sec(x)sec(x)·tan(x)
csc(x)-csc(x)·cot(x)

Differentiation Rules

Sum Rule: d/dx[f + g] = f' + g'
Difference Rule: d/dx[f - g] = f' - g'
Product Rule: d/dx[f·g] = f'·g + f·g'
Quotient Rule: d/dx[f/g] = (f'·g - f·g')/g²
Chain Rule: d/dx[f(g(x))] = f'(g(x))·g'(x)
Power Rule: d/dx[x^n] = n·x^(n-1)

Examples

Example 1: d/dx[3x² + 5x - 7] = 6x + 5
Example 2: d/dx[x³·sin(x)] = 3x²·sin(x) + x³·cos(x)
Example 3: d/dx[sin(x²)] = cos(x²)·2x
Example 4: d/dx[(x² + 1)/(x - 1)] = [(2x)(x-1) - (x²+1)(1)]/(x-1)²

2. Integrals (Integration)

Integration is the reverse of differentiation. It finds the area under a curve and is used in probability distributions and optimization.

Basic Integral Rules

FunctionIntegral
k (constant)kx + C
x^n (n ≠ -1)x^(n+1)/(n+1) + C
1/xln|x| + C
e^xe^x + C
a^xa^x/ln(a) + C
sin(x)-cos(x) + C
cos(x)sin(x) + C
sec²(x)tan(x) + C
csc²(x)-cot(x) + C
sec(x)·tan(x)sec(x) + C
1/√(1-x²)arcsin(x) + C
1/(1+x²)arctan(x) + C

Integration Techniques

1. Substitution: ∫ f(g(x))·g'(x)dx = ∫ f(u)du where u = g(x)
2. Integration by Parts: ∫ u·dv = uv - ∫ v·du
3. Partial Fractions: Break complex fractions into simpler ones
4. Trigonometric Substitution: Use for √(a²-x²), √(a²+x²), √(x²-a²)

Examples

Example 1: ∫ 3x² dx = x³ + C
Example 2: ∫ sin(2x) dx = -cos(2x)/2 + C
Example 3: ∫ x·e^x dx = xe^x - e^x + C
Example 4: ∫₀¹ x² dx = [x³/3]₀¹ = 1/3

3. Limits

Limits describe the behavior of functions as they approach a particular point. Essential for understanding derivatives and continuity.

Basic Limits

lim(x→a) c = c
lim(x→a) x = a
lim(x→a) x^n = a^n
lim(x→∞) 1/x = 0
lim(x→0) sin(x)/x = 1
lim(x→∞) (1 + 1/x)^x = e

L'Hôpital's Rule

For indeterminate forms (0/0 or ∞/∞):

lim(x→a) f(x)/g(x) = lim(x→a) f'(x)/g'(x)

Example:

lim(x→0) sin(x)/x = lim(x→0) cos(x)/1 = 1

4. Chain Rule

⚡ CRITICAL FOR DEEP LEARNING

The chain rule is the mathematical foundation of backpropagation in neural networks. It allows us to compute gradients through composed functions.

Single Variable

d/dx[f(g(x))] = f'(g(x))·g'(x)

Multivariable (Partial Derivatives)

∂z/∂x = (∂z/∂u)·(∂u/∂x) + (∂z/∂v)·(∂v/∂x)

Neural Networks Example

Loss = L(ŷ, y) where ŷ = f(Wx + b)
∂L/∂W = (∂L/∂ŷ)·(∂ŷ/∂W) ← This is backpropagation!

5. Partial Derivatives

For functions with multiple variables, partial derivatives measure the rate of change with respect to one variable while keeping others constant.

∂f/∂x: Derivative with respect to x (treat y, z, ... as constants)
∂f/∂y: Derivative with respect to y (treat x, z, ... as constants)

Example

Given: f(x, y) = x²y + 3xy² - 5
Solution:
∂f/∂x = 2xy + 3y²
∂f/∂y = x² + 6xy

6. Gradient

⚡ CORE OF MACHINE LEARNING

The gradient is a vector containing all partial derivatives. It points in the direction of steepest ascent and is used in gradient descent optimization.

Definition

∇f = [∂f/∂x₁, ∂f/∂x₂, ..., ∂f/∂xₙ]

Example

Given: f(x, y) = x² + y²
Gradient: ∇f = [2x, 2y]
← Points in direction of steepest ascent

Gradient Descent (How ML Models Learn)

θ_new = θ_old - α·∇L(θ)

where α = learning rate, L = loss function

7. Common Derivatives for ML/DL

Activation Functions

FunctionDerivative
Sigmoid: σ(x) = 1/(1+e^-x)σ(x)·(1-σ(x))
Tanh: tanh(x)1 - tanh²(x)
ReLU: max(0, x)0 if x<0, 1 if x>0
Leaky ReLUα if x<0, 1 if x>0

Loss Functions

LossDerivative
MSE: (ŷ-y)²2(ŷ-y)
Cross-Entropy: -y·log(ŷ)-y/ŷ

8. Taylor Series

Taylor series approximates any smooth function as an infinite sum of polynomials. Used in numerical methods and approximations.

f(x) = f(a) + f'(a)(x-a) + f''(a)(x-a)²/2! + f'''(a)(x-a)³/3! + ...

Common Series (around x=0)

e^x = 1 + x + x²/2! + x³/3! + x⁴/4! + ...
sin(x) = x - x³/3! + x⁵/5! - x⁷/7! + ...
cos(x) = 1 - x²/2! + x⁴/4! - x⁶/6! + ...
ln(1+x) = x - x²/2 + x³/3 - x⁴/4 + ... (for |x| < 1)

9. Integration Tricks

U-Substitution

Problem: ∫ 2x·cos(x²) dx
Let u = x², du = 2x dx
= ∫ cos(u) du
= sin(u) + C
= sin(x²) + C

Integration by Parts (ILATE)

Order: Inverse, Log, Algebraic, Trig, Exponential

Problem: ∫ x·ln(x) dx
u = ln(x), dv = x dx
du = 1/x dx, v = x²/2
= (x²/2)·ln(x) - ∫ x²/2·1/x dx
= (x²/2)·ln(x) - x²/4 + C

10. Definite Integrals

Fundamental Theorem of Calculus

∫ₐᵇ f(x) dx = F(b) - F(a) where F'(x) = f(x)

Properties

∫ₐᵇ f(x) dx = -∫ᵇₐ f(x) dx
∫ₐᵇ [f(x) + g(x)] dx = ∫ₐᵇ f(x) dx + ∫ₐᵇ g(x) dx
∫ₐᵇ c·f(x) dx = c·∫ₐᵇ f(x) dx

11. Multivariable Calculus

Gradient

∇f = [∂f/∂x, ∂f/∂y, ∂f/∂z]

Direction of steepest ascent

Directional Derivative

D_u f = ∇f · u

Rate of change in direction u

Hessian Matrix

H = [∂²f/∂x² ∂²f/∂x∂y]
[∂²f/∂y∂x ∂²f/∂y²]

Second derivatives

12. Quick Reference for Neural Networks

Backpropagation Chain Rule

∂L/∂w = ∂L/∂ŷ · ∂ŷ/∂z · ∂z/∂w

Gradient of Common Operations

Matrix Multiplication: y = Wx

∂y/∂W = x^T
∂y/∂x = W^T

Element-wise: y = x ⊙ w

∂y/∂x = w
∂y/∂w = x

Quick Tips for Exams

✓ Always check for 0/0, ∞/∞ → Use L'Hôpital's Rule
✓ Integration: Try u-substitution first
✓ Product of functions: Use Product Rule or Integration by Parts
✓ Chain functions: Always use Chain Rule
✓ Don't forget +C in indefinite integrals
✓ Check your limits in definite integrals

Practice Problems

1. d/dx[x³·e^(2x)] = ?
2. ∫ x·cos(x) dx = ?
3. lim(x→0) (e^x - 1)/x = ?
4. ∇(x²y + y²z) = ?
Show Answers
1. 3x²·e^(2x) + x³·2e^(2x) = e^(2x)(3x² + 2x³)
2. x·sin(x) + cos(x) + C
3. 1 [Use L'Hôpital's Rule]
4. [2xy, x² + 2yz, y²]

Pro Tips

For ML/DL

Focus on Chain Rule, Partial Derivatives, and Gradient - they're the foundation of backpropagation.

For Exams

Memorize basic derivatives and integrals tables. Practice identifying which rule to use.

Practice

Do problems daily - calculus needs muscle memory. Start simple, then increase complexity.