Contents
Backup & Recovery
1. Failure Types
| Failure Type | Cause | Data Loss | Recovery Method |
|---|---|---|---|
| Transaction | Logic error, abort | Volatile only | Rollback using log |
| System Crash | Power failure | RAM lost | Redo/Undo using log |
| Disk Failure | Hardware failure | Permanent loss | Restore from backup |
2. Log-Based Recovery
Log Record Format
Example Log
<T1, start>
<T1, A, 1000, 950> -- A changed from 1000 to 950
<T1, B, 2000, 2050> -- B changed from 2000 to 2050
<T1, commit>Write-Ahead Logging (WAL)
- Rule 1: Log record must be written BEFORE database update
- Rule 2: All log records of transaction must be written BEFORE commit
- Ensures: Can undo incomplete transactions & redo committed transactions
3. UNDO & REDO Operations
UNDO
Purpose: Reverse effects of incomplete transaction
When: Transaction started but not committed before crash
Log: <T1, A, 1000, 950>
Recovery: A = 1000 (restore old value)REDO
Purpose: Reapply effects of committed transaction
When: Transaction committed but changes not written to disk
Log: <T1, A, 1000, 950>
Recovery: A = 950 (reapply new value)Recovery Algorithm
- Step 1: Scan log backward
- • Find transactions that committed (have <Ti, commit>)
- • Find transactions that didn't commit (no <Ti, commit>)
- Step 2: UNDO Phase (backward)
- • For uncommitted transactions
- • Restore old_values
- Step 3: REDO Phase (forward)
- • For committed transactions
- • Reapply new_values
4. Checkpointing
Purpose: Reduce recovery time by marking a point where all updates are written to disk
Process:
- 1. Stop accepting new transactions
- 2. Complete all active transactions
- 3. Write all buffers to disk
- 4. Write <checkpoint> to log
- 5. Resume normal operation
Benefits
- • Recovery starts from last checkpoint
- • No need to scan entire log
- • Only process transactions active at or after checkpoint
Query Optimization
5. Query Processing & Optimization
Cost Estimation
Primary Cost: Number of disk block accesses
Secondary: CPU time (usually ignored)
Selection Cost
| Method | Cost | When to Use |
|---|---|---|
| Linear Search | br | No index, small table |
| Binary Search | ⌈log₂(br)⌉ | Sorted file |
| Primary Index | height + 1 | Index on key |
| Secondary Index | height + matches | Index on non-key |
Join Algorithms
| Join Type | Cost | Best When |
|---|---|---|
| Nested-Loop | br + (nr × bs) | Small tables |
| Block Nested-Loop | br + (br × bs) | Medium tables |
| Hash Join | ≈ 3(br + bs) | Large tables, equi-join |
| Sort-Merge | sort + (br + bs) | Sorted data |
Optimization Techniques
1. Selection Push-Down
BAD: σcondition(R ⋈ S) | GOOD: σcondition(R) ⋈ S2. Projection Push-Down
Project early to reduce data size3. Join Ordering
Start with join that produces smallest result6. Complete Formula Reference Sheet
Disk & Storage
Time = Seek Time + Rotational Latency + Transfer TimeTransfer Time = Data Size / Data Transfer RateLatency = 1 / (2 × RPM) × 60 secondsMTBF(array) = MTBF(one disk) / Number of disksIndexing
key_size × (p-1) + pointer_size × p ≤ block_sizeheight = ⌈logp(n)⌉max = p-1, min = ⌈(p-1)/2⌉size = (rows × distinct_values) / 8 bytesFunctional Dependencies
count = 2(n-k) where n=total attrs, k=candidate key sizeR1 ∩ R2 → R1 OR R1 ∩ R2 → R2Query Optimization Costs
Cost = br (blocks in relation)Cost = br + (nr × bs)Cost ≈ 3(br + bs)Transactions
min = Σ(max_requirement) - (n - 1)Strict ⊂ Cascadeless ⊂ Recoverable7. Decision Trees
Which SQL Query to Use?
Finding specific records? → SELECT ... WHERE
Counting/Aggregating?
├─ Overall count → COUNT(*)
└─ Per group → COUNT(*) + GROUP BY
Combining results?
├─ All from both → UNION
├─ Common only → INTERSECT
└─ In A not B → EXCEPT
Matching pattern? → LIKE '%pattern%'
Data from multiple tables?
├─ Same column names → NATURAL JOIN or USING
├─ Different columns → JOIN ... ON
└─ Same table → Self-join with aliases
"Has NOT done..." → EXCEPT or NOT EXISTS
"More than some" → > ANY (subquery)
"More than all" → > ALL (subquery)
Complex multi-step? → WITH (CTE)Which Normal Form?
All values atomic? ──NO→ ❌ Not 1NF
YES
│
Single attribute key? ──YES→ Skip to 3NF check
NO
│
Partial dependency exists?
├─ YES → ❌ Not 2NF
└─ NO → ✅ In 2NF
│
Transitive dependency exists?
├─ YES → ❌ Not 3NF
└─ NO → Check each FD α→β:
- α not superkey AND
- β not prime?
├─ YES → ❌ Not 3NF
└─ NO → ✅ In 3NF
│
Every FD has LHS = superkey?
├─ YES → ✅ In BCNF
└─ NO → ❌ Not BCNF (but in 3NF)Is Schedule Serializable?
Draw precedence graph
(Edge Ti→Tj if conflicting ops where Ti first)
│
├─ Cycle exists?
│ ├─ YES → ❌ NOT conflict serializable
│ └─ NO → ✅ Conflict serializable
│
└─ Check for blind writes
├─ Blind write exists → ❌ NOT view serializable
└─ No blind writes → May be view serializable8. Common Mistakes & Traps
❌ Using = NULL instead of IS NULL
WHERE phone IS NULL (correct)❌ Confusing WHERE and HAVING
WHERE filters rows, HAVING filters groups❌ Thinking 3NF = BCNF
3NF allows prime→prime, BCNF doesn't❌ Wrong decomposition test
Test R1 ∩ R2 → R1 or R2 (not R1 ∪ R2)❌ Sparse index on unordered file
Sparse ONLY works on ordered files❌ Thinking 2PL prevents deadlock
2PL guarantees serializability, NOT deadlock-free❌ Missing candidate keys
Check all attribute combinations systematically❌ Wrong B-tree order formula
key_size × (p-1) + pointer × p ≤ block❌ Confusing recoverable and cascadeless
All cascadeless → recoverable (not reverse)❌ Always using index for queries
Use index only for <10% of rows9. Assignment Question Patterns
| Keywords in Question | Solution Approach |
|---|---|
| "Find names where..." | SELECT name FROM ... WHERE |
| "Count/Total number" | COUNT() + GROUP BY |
| "Has NOT done..." | EXCEPT or NOT EXISTS |
| "Maximum/Minimum in each" | Subquery with GROUP BY + MAX/MIN |
| "Pattern matching" | LIKE '%pattern%' |
| "Find order of B-tree" | key_size×(p-1) + ptr×p ≤ block |
| "Is schedule serializable?" | Draw precedence graph, check cycles |
| "Find candidate keys" | Attrs not on RHS, compute closure |
| "Highest normal form?" | Check partial, transitive, BCNF rules |
| "Lossless decomposition?" | R1 ∩ R2 → R1 or R2? |
Quick Reference Card
Key Reminders
- ✓ Use IS NULL, never = NULL
- ✓ WHERE filters rows, HAVING filters groups
- ✓ Sparse index ONLY for ordered files
- ✓ 2PL guarantees serializability, NOT deadlock-free
- ✓ Strict ⊂ Cascadeless ⊂ Recoverable
- ✓ Block size must be multiple of sector size
Most Important Formulas
- • B-Tree Order: key×(p-1) + ptr×p ≤ block
- • Bitmap: (rows × distinct) / 8 bytes
- • MTBF: MTBF(one) / n
- • Disk Time: Seek + Rotate + Transfer
- • Super Keys: 2^(n-k)
- • Deadlock-Free: Σ(max) - (n-1)

