Back to Library
DBMS Recovery & Optimization
PART 5 OF 5 - FINAL

Recovery, Optimization & Complete Reference

Master backup & recovery, query optimization, and get complete reference materials including all formulas, decision trees, and problem-solving patterns.

← Part 4Part 5 - Complete

Contents

Backup & Recovery

1. Failure Types

Failure TypeCauseData LossRecovery Method
TransactionLogic error, abortVolatile onlyRollback using log
System CrashPower failureRAM lostRedo/Undo using log
Disk FailureHardware failurePermanent lossRestore from backup

2. Log-Based Recovery

Log Record Format

<Ti, start> → Transaction starts
<Ti, X, old_value, new_value> → Update record
<Ti, commit> → Transaction commits
<Ti, abort> → Transaction aborts

Example Log

<T1, start>
<T1, A, 1000, 950>    -- A changed from 1000 to 950
<T1, B, 2000, 2050>   -- B changed from 2000 to 2050
<T1, commit>

Write-Ahead Logging (WAL)

  • Rule 1: Log record must be written BEFORE database update
  • Rule 2: All log records of transaction must be written BEFORE commit
  • Ensures: Can undo incomplete transactions & redo committed transactions

3. UNDO & REDO Operations

UNDO

Purpose: Reverse effects of incomplete transaction

When: Transaction started but not committed before crash

Log: <T1, A, 1000, 950>
Recovery: A = 1000 (restore old value)

REDO

Purpose: Reapply effects of committed transaction

When: Transaction committed but changes not written to disk

Log: <T1, A, 1000, 950>
Recovery: A = 950 (reapply new value)

Recovery Algorithm

  1. Step 1: Scan log backward
    • • Find transactions that committed (have <Ti, commit>)
    • • Find transactions that didn't commit (no <Ti, commit>)
  2. Step 2: UNDO Phase (backward)
    • • For uncommitted transactions
    • • Restore old_values
  3. Step 3: REDO Phase (forward)
    • • For committed transactions
    • • Reapply new_values

4. Checkpointing

Purpose: Reduce recovery time by marking a point where all updates are written to disk

Process:

  1. 1. Stop accepting new transactions
  2. 2. Complete all active transactions
  3. 3. Write all buffers to disk
  4. 4. Write <checkpoint> to log
  5. 5. Resume normal operation

Benefits

  • • Recovery starts from last checkpoint
  • • No need to scan entire log
  • • Only process transactions active at or after checkpoint

Query Optimization

5. Query Processing & Optimization

Cost Estimation

Primary Cost: Number of disk block accesses

Secondary: CPU time (usually ignored)

Selection Cost

MethodCostWhen to Use
Linear SearchbrNo index, small table
Binary Search⌈log₂(br)⌉Sorted file
Primary Indexheight + 1Index on key
Secondary Indexheight + matchesIndex on non-key

Join Algorithms

Join TypeCostBest When
Nested-Loopbr + (nr × bs)Small tables
Block Nested-Loopbr + (br × bs)Medium tables
Hash Join≈ 3(br + bs)Large tables, equi-join
Sort-Mergesort + (br + bs)Sorted data

Optimization Techniques

1. Selection Push-Down

BAD: σcondition(R ⋈ S) | GOOD: σcondition(R) ⋈ S

2. Projection Push-Down

Project early to reduce data size

3. Join Ordering

Start with join that produces smallest result

6. Complete Formula Reference Sheet

Disk & Storage

Total Disk Access Time
Time = Seek Time + Rotational Latency + Transfer Time
Transfer Time
Transfer Time = Data Size / Data Transfer Rate
Average Rotational Latency
Latency = 1 / (2 × RPM) × 60 seconds
MTBF
MTBF(array) = MTBF(one disk) / Number of disks

Indexing

B-Tree Order
key_size × (p-1) + pointer_size × p ≤ block_size
B-Tree Height
height = ⌈logp(n)⌉
Max/Min Keys per Node
max = p-1, min = ⌈(p-1)/2⌉
Bitmap Index Size
size = (rows × distinct_values) / 8 bytes

Functional Dependencies

Number of Super Keys
count = 2(n-k) where n=total attrs, k=candidate key size
Lossless Decomposition Test
R1 ∩ R2 → R1 OR R1 ∩ R2 → R2

Query Optimization Costs

Linear Search
Cost = br (blocks in relation)
Nested-Loop Join
Cost = br + (nr × bs)
Hash Join
Cost ≈ 3(br + bs)

Transactions

Deadlock-Free Resources
min = Σ(max_requirement) - (n - 1)
Schedule Hierarchy
Strict ⊂ Cascadeless ⊂ Recoverable

7. Decision Trees

Which SQL Query to Use?

Finding specific records? → SELECT ... WHERE

Counting/Aggregating?
  ├─ Overall count → COUNT(*)
  └─ Per group → COUNT(*) + GROUP BY

Combining results?
  ├─ All from both → UNION
  ├─ Common only → INTERSECT
  └─ In A not B → EXCEPT

Matching pattern? → LIKE '%pattern%'

Data from multiple tables?
  ├─ Same column names → NATURAL JOIN or USING
  ├─ Different columns → JOIN ... ON
  └─ Same table → Self-join with aliases

"Has NOT done..." → EXCEPT or NOT EXISTS

"More than some" → > ANY (subquery)

"More than all" → > ALL (subquery)

Complex multi-step? → WITH (CTE)

Which Normal Form?

All values atomic? ──NO→ ❌ Not 1NF
        YES
        │
Single attribute key? ──YES→ Skip to 3NF check
        NO
        │
Partial dependency exists?
  ├─ YES → ❌ Not 2NF
  └─ NO → ✅ In 2NF
        │
Transitive dependency exists?
  ├─ YES → ❌ Not 3NF
  └─ NO → Check each FD α→β:
           - α not superkey AND
           - β not prime?
      ├─ YES → ❌ Not 3NF
      └─ NO → ✅ In 3NF
        │
Every FD has LHS = superkey?
  ├─ YES → ✅ In BCNF
  └─ NO → ❌ Not BCNF (but in 3NF)

Is Schedule Serializable?

Draw precedence graph
  (Edge Ti→Tj if conflicting ops where Ti first)
        │
        ├─ Cycle exists?
        │   ├─ YES → ❌ NOT conflict serializable
        │   └─ NO → ✅ Conflict serializable
        │
        └─ Check for blind writes
            ├─ Blind write exists → ❌ NOT view serializable
            └─ No blind writes → May be view serializable

8. Common Mistakes & Traps

❌ Using = NULL instead of IS NULL

WHERE phone IS NULL (correct)

❌ Confusing WHERE and HAVING

WHERE filters rows, HAVING filters groups

❌ Thinking 3NF = BCNF

3NF allows prime→prime, BCNF doesn't

❌ Wrong decomposition test

Test R1 ∩ R2 → R1 or R2 (not R1 ∪ R2)

❌ Sparse index on unordered file

Sparse ONLY works on ordered files

❌ Thinking 2PL prevents deadlock

2PL guarantees serializability, NOT deadlock-free

❌ Missing candidate keys

Check all attribute combinations systematically

❌ Wrong B-tree order formula

key_size × (p-1) + pointer × p ≤ block

❌ Confusing recoverable and cascadeless

All cascadeless → recoverable (not reverse)

❌ Always using index for queries

Use index only for <10% of rows

9. Assignment Question Patterns

Keywords in QuestionSolution Approach
"Find names where..."SELECT name FROM ... WHERE
"Count/Total number"COUNT() + GROUP BY
"Has NOT done..."EXCEPT or NOT EXISTS
"Maximum/Minimum in each"Subquery with GROUP BY + MAX/MIN
"Pattern matching"LIKE '%pattern%'
"Find order of B-tree"key_size×(p-1) + ptr×p ≤ block
"Is schedule serializable?"Draw precedence graph, check cycles
"Find candidate keys"Attrs not on RHS, compute closure
"Highest normal form?"Check partial, transitive, BCNF rules
"Lossless decomposition?"R1 ∩ R2 → R1 or R2?

Quick Reference Card

Key Reminders

  • ✓ Use IS NULL, never = NULL
  • ✓ WHERE filters rows, HAVING filters groups
  • ✓ Sparse index ONLY for ordered files
  • ✓ 2PL guarantees serializability, NOT deadlock-free
  • ✓ Strict ⊂ Cascadeless ⊂ Recoverable
  • ✓ Block size must be multiple of sector size

Most Important Formulas

  • • B-Tree Order: key×(p-1) + ptr×p ≤ block
  • • Bitmap: (rows × distinct) / 8 bytes
  • • MTBF: MTBF(one) / n
  • • Disk Time: Seek + Rotate + Transfer
  • • Super Keys: 2^(n-k)
  • • Deadlock-Free: Σ(max) - (n-1)

Complete Series