Calculus
FOUNDATIONS

Calculus for Machine Learning

Complete reference guide covering derivatives, integrals, limits, gradients, and backpropagation for neural networks.

1. Derivatives (Differentiation)

The derivative measures the rate of change of a function. It's the foundation of backpropagation in neural networks.

Basic Derivative Rules

FunctionDerivative
c (constant)0
x1
x^nn·x^(n-1)
e^xe^x
a^xa^x · ln(a)
ln(x)1/x
log_a(x)1/(x·ln(a))
sin(x)cos(x)
cos(x)-sin(x)
tan(x)sec²(x)
cot(x)-csc²(x)
sec(x)sec(x)·tan(x)
csc(x)-csc(x)·cot(x)

Differentiation Rules

Sum Rule: d/dx[f + g] = f' + g'
Difference Rule: d/dx[f - g] = f' - g'
Product Rule: d/dx[f·g] = f'·g + f·g'
Quotient Rule: d/dx[f/g] = (f'·g - f·g')/g²
Chain Rule: d/dx[f(g(x))] = f'(g(x))·g'(x)
Power Rule: d/dx[x^n] = n·x^(n-1)

Examples

Example 1: d/dx[3x² + 5x - 7] = 6x + 5
Example 2: d/dx[x³·sin(x)] = 3x²·sin(x) + x³·cos(x)
Example 3: d/dx[sin(x²)] = cos(x²)·2x
Example 4: d/dx[(x² + 1)/(x - 1)] = [(2x)(x-1) - (x²+1)(1)]/(x-1)²

2. Integrals (Integration)

Integration is the reverse of differentiation. It finds the area under a curve and is used in probability distributions and optimization.

Basic Integral Rules

FunctionIntegral
k (constant)kx + C
x^n (n ≠ -1)x^(n+1)/(n+1) + C
1/xln|x| + C
e^xe^x + C
a^xa^x/ln(a) + C
sin(x)-cos(x) + C
cos(x)sin(x) + C
sec²(x)tan(x) + C
csc²(x)-cot(x) + C
sec(x)·tan(x)sec(x) + C
1/√(1-x²)arcsin(x) + C
1/(1+x²)arctan(x) + C

Integration Techniques

1. Substitution: ∫ f(g(x))·g'(x)dx = ∫ f(u)du where u = g(x)
2. Integration by Parts: ∫ u·dv = uv - ∫ v·du
3. Partial Fractions: Break complex fractions into simpler ones
4. Trigonometric Substitution: Use for √(a²-x²), √(a²+x²), √(x²-a²)

Examples

Example 1: ∫ 3x² dx = x³ + C
Example 2: ∫ sin(2x) dx = -cos(2x)/2 + C
Example 3: ∫ x·e^x dx = xe^x - e^x + C
Example 4: ∫₀¹ x² dx = [x³/3]₀¹ = 1/3

3. Limits

Limits describe the behavior of functions as they approach a particular point. Essential for understanding derivatives and continuity.

Basic Limits

lim(x→a) c = c
lim(x→a) x = a
lim(x→a) x^n = a^n
lim(x→∞) 1/x = 0
lim(x→0) sin(x)/x = 1
lim(x→∞) (1 + 1/x)^x = e

L'Hôpital's Rule

For indeterminate forms (0/0 or ∞/∞):

lim(x→a) f(x)/g(x) = lim(x→a) f'(x)/g'(x)

Example:

lim(x→0) sin(x)/x = lim(x→0) cos(x)/1 = 1

4. Chain Rule

⚡ CRITICAL FOR DEEP LEARNING

The chain rule is the mathematical foundation of backpropagation in neural networks. It allows us to compute gradients through composed functions.

Single Variable

d/dx[f(g(x))] = f'(g(x))·g'(x)

Multivariable (Partial Derivatives)

∂z/∂x = (∂z/∂u)·(∂u/∂x) + (∂z/∂v)·(∂v/∂x)

Neural Networks Example

Loss = L(ŷ, y) where ŷ = f(Wx + b)
∂L/∂W = (∂L/∂ŷ)·(∂ŷ/∂W) ← This is backpropagation!

5. Partial Derivatives

For functions with multiple variables, partial derivatives measure the rate of change with respect to one variable while keeping others constant.

∂f/∂x: Derivative with respect to x (treat y, z, ... as constants)
∂f/∂y: Derivative with respect to y (treat x, z, ... as constants)

Example

Given: f(x, y) = x²y + 3xy² - 5
Solution:
∂f/∂x = 2xy + 3y²
∂f/∂y = x² + 6xy

6. Gradient

⚡ CORE OF MACHINE LEARNING

The gradient is a vector containing all partial derivatives. It points in the direction of steepest ascent and is used in gradient descent optimization.

Definition

∇f = [∂f/∂x₁, ∂f/∂x₂, ..., ∂f/∂xₙ]

Example

Given: f(x, y) = x² + y²
Gradient: ∇f = [2x, 2y]
← Points in direction of steepest ascent

Gradient Descent (How ML Models Learn)

θ_new = θ_old - α·∇L(θ)

where α = learning rate, L = loss function

7. Common Derivatives for ML/DL

Activation Functions

FunctionDerivative
Sigmoid: σ(x) = 1/(1+e^-x)σ(x)·(1-σ(x))
Tanh: tanh(x)1 - tanh²(x)
ReLU: max(0, x)0 if x<0, 1 if x>0
Leaky ReLUα if x<0, 1 if x>0

Loss Functions

LossDerivative
MSE: (ŷ-y)²2(ŷ-y)
Cross-Entropy: -y·log(ŷ)-y/ŷ

8. Taylor Series

Taylor series approximates any smooth function as an infinite sum of polynomials. Used in numerical methods and approximations.

f(x) = f(a) + f'(a)(x-a) + f''(a)(x-a)²/2! + f'''(a)(x-a)³/3! + ...

Common Series (around x=0)

e^x = 1 + x + x²/2! + x³/3! + x⁴/4! + ...
sin(x) = x - x³/3! + x⁵/5! - x⁷/7! + ...
cos(x) = 1 - x²/2! + x⁴/4! - x⁶/6! + ...
ln(1+x) = x - x²/2 + x³/3 - x⁴/4 + ... (for |x| < 1)

9. Integration Tricks

U-Substitution

Problem: ∫ 2x·cos(x²) dx
Let u = x², du = 2x dx
= ∫ cos(u) du
= sin(u) + C
= sin(x²) + C

Integration by Parts (ILATE)

Order: Inverse, Log, Algebraic, Trig, Exponential

Problem: ∫ x·ln(x) dx
u = ln(x), dv = x dx
du = 1/x dx, v = x²/2
= (x²/2)·ln(x) - ∫ x²/2·1/x dx
= (x²/2)·ln(x) - x²/4 + C

10. Definite Integrals

Fundamental Theorem of Calculus

∫ₐᵇ f(x) dx = F(b) - F(a) where F'(x) = f(x)

Properties

∫ₐᵇ f(x) dx = -∫ᵇₐ f(x) dx
∫ₐᵇ [f(x) + g(x)] dx = ∫ₐᵇ f(x) dx + ∫ₐᵇ g(x) dx
∫ₐᵇ c·f(x) dx = c·∫ₐᵇ f(x) dx

11. Multivariable Calculus

Gradient

∇f = [∂f/∂x, ∂f/∂y, ∂f/∂z]

Direction of steepest ascent

Directional Derivative

D_u f = ∇f · u

Rate of change in direction u

Hessian Matrix

H = [∂²f/∂x² ∂²f/∂x∂y]
[∂²f/∂y∂x ∂²f/∂y²]

Second derivatives

12. Quick Reference for Neural Networks

Backpropagation Chain Rule

∂L/∂w = ∂L/∂ŷ · ∂ŷ/∂z · ∂z/∂w

Gradient of Common Operations

Matrix Multiplication: y = Wx

∂y/∂W = x^T
∂y/∂x = W^T

Element-wise: y = x ⊙ w

∂y/∂x = w
∂y/∂w = x

Quick Tips for Exams

✓ Always check for 0/0, ∞/∞ → Use L'Hôpital's Rule
✓ Integration: Try u-substitution first
✓ Product of functions: Use Product Rule or Integration by Parts
✓ Chain functions: Always use Chain Rule
✓ Don't forget +C in indefinite integrals
✓ Check your limits in definite integrals

Practice Problems

1. d/dx[x³·e^(2x)] = ?
2. ∫ x·cos(x) dx = ?
3. lim(x→0) (e^x - 1)/x = ?
4. ∇(x²y + y²z) = ?
Show Answers
1. 3x²·e^(2x) + x³·2e^(2x) = e^(2x)(3x² + 2x³)
2. x·sin(x) + cos(x) + C
3. 1 [Use L'Hôpital's Rule]
4. [2xy, x² + 2yz, y²]

Pro Tips

For ML/DL

Focus on Chain Rule, Partial Derivatives, and Gradient - they're the foundation of backpropagation.

For Exams

Memorize basic derivatives and integrals tables. Practice identifying which rule to use.

Practice

Do problems daily - calculus needs muscle memory. Start simple, then increase complexity.