Contents
Indexing & Hashing
1. Index Types
| Index Type | Key Order | Data Order | Dense/Sparse |
|---|---|---|---|
| Primary | Ordered | Ordered (same) | Can be both |
| Clustering | Ordered | Ordered (different key) | Dense |
| Secondary | Ordered | Unordered | Always dense |
Key Rules
- • Primary Index: Search key = Ordering key of file
- • Secondary Index: Search key ≠ Ordering key
- • Dense Index: Entry for every search key
- • Sparse Index: Entry for some keys (only for ordered files)
2. Dense vs Sparse Index
Dense Index
Index: [10] [20] [30] [40] [50]
↓ ↓ ↓ ↓ ↓
Data: [10] [20] [30] [40] [50]- • Entry for every record
- • Faster search
- • More space required
- • Works on any file
Sparse Index
Index: [10] [30] [50]
↓ ↓ ↓
Data: [10][20] [30][40] [50][60]- • Entry for some records
- • Slower search
- • Less space required
- • Only for ordered files
3. Bitmap Index
Formula
Size (bytes) = (Number of rows × Number of distinct values) / 8Example Calculation
Given:
- • Attribute: Grade (values: A, B, C, D, F)
- • Distinct values = 5
- • Bitmap size = 200 bytes
Find: Number of rows
When to Use Bitmap Index
- • Low cardinality (few distinct values)
- • Read-heavy workloads
- • Data warehousing
- • Gender, Status, Category columns
SQL Syntax
CREATE BITMAP INDEX index_name
ON table_name(column_name);
-- Example
CREATE BITMAP INDEX team_name
ON Player(team);4. B-Trees & B⁺-Trees
Order Formula
key_size × (p - 1) + pointer_size × p ≤ block_sizeSolve for p (order)
Example: Finding Order
Given:
- • Key size = 8 bytes
- • Pointer size = 12 bytes
- • Block size = 512 bytes
Solution:
B-Tree Properties (Order p)
| Property | Formula | Example (p=17) |
|---|---|---|
| Max keys per node | p - 1 | 16 |
| Max children per node | p | 17 |
| Min keys (non-root) | ⌈(p-1)/2⌉ | 8 |
| Min children (non-root) | ⌈p/2⌉ | 9 |
Height Calculation
Height = ⌈logp(N)⌉where N = number of records, p = order
Example:
B⁺-Tree vs B-Tree
- • B⁺-Tree: All data in leaf nodes, internal nodes for navigation only
- • B-Tree: Data in all nodes
- • B⁺-Tree: Leaf nodes are linked (good for range queries)
- • Same formulas apply for both
5. Hashing
Hash Function Properties
- • Maps key → bucket number
- • Should distribute evenly
- • Minimize collisions
- • Deterministic (same input → same output)
Example Hash Functions
h₁(n) = n mod 7Simple modulo operation
h₂(n) = n² + 1 mod 7Quadratic hashing
h₃(n) = 3n + 3 mod 7Linear hashing
Extendable Hashing
- • Global Depth (d): Bits used from hash value
- • Local Depth: Bits for specific bucket
- • Bucket Split: When bucket overflows, increase depth
Example:
Transactions
6. ACID Properties
A - Atomicity
All or nothing. Either all operations complete or none do.
Transfer $100: Debit AND Credit both execute, or noneC - Consistency
Database must be consistent before and after transaction. May be inconsistent during execution.
Balance ≥ 0 before transaction AND after transactionI - Isolation
Concurrent transactions appear to execute sequentially. T1 doesn't see T2's partial changes.
Two users withdrawing from same account don't interfereD - Durability
Once committed, changes are permanent. Survives system crashes.
Power failure after COMMIT won't undo the transaction7. Serializability
Conflict Serializability
Method: Draw precedence graph and check for cycles
Steps:
- 1. Nodes = Transactions
- 2. Add edge Ti → Tj if they have conflicting operations where Ti comes first
- 3. If cycle exists → NOT serializable
- 4. No cycle → Serializable
Conflicting Operations
| Operation 1 | Operation 2 | Conflict? |
|---|---|---|
| r₁(A) | w₂(A) | ✗ Yes |
| w₁(A) | r₂(A) | ✗ Yes |
| w₁(A) | w₂(A) | ✗ Yes |
| r₁(A) | r₂(A) | ✓ No conflict |
Example
Schedule S:
w₁(A), r₂(A), w₂(B), r₁(B)Conflicts:
Graph:
T1 ⇄ T2 (cycle!)Result: NOT serializable
8. Two-Phase Locking (2PL)
Two Phases
1. Growing Phase
Acquire locks only. Cannot release any lock.
2. Shrinking Phase
Release locks only. Cannot acquire new locks.
Lock Types
| Lock | Purpose | Compatible with S? | Compatible with X? |
|---|---|---|---|
| Shared (S) | For reading | ✓ Yes | ✗ No |
| Exclusive (X) | For writing | ✗ No | ✗ No |
Key Points
- • ✅ Guarantees conflict serializability
- • ❌ Does NOT prevent deadlock
- • Strict 2PL: Release all locks at commit/abort
- • Rigorous 2PL: Release all locks at end only
9. Deadlock
Detection: Wait-For Graph
Draw graph where edge Ti → Tj means Ti waits for Tj
Prevention Methods
Wait-Die
Older transaction waits, younger aborts
Wound-Wait
Older forces younger to abort
Timeout
Abort if waiting too long
Deadlock-Free Protocols
- ✅ Timestamp Ordering Protocol
- ✅ Tree Protocol
- ❌ 2PL (can have deadlock)
10. Schedule Types
| Type | Read Uncommitted? | Write Uncommitted? | Cascading Rollback? | |
|---|---|---|---|---|
| Recoverable | Yes | Yes | Yes | |
| Cascadeless | Cascadeless | No | Yes | No |
| Strict | No | No | No |
Recoverable Schedule
If Ti reads from Tj, then Tj must commit before Ti commits.
w₁(A), r₂(A), commit₁, commit₂ ✅ RecoverableCascadeless Schedule
Transaction reads only committed data. No cascading rollbacks.
w₁(A), commit₁, r₂(A), commit₂ ✅ CascadelessStrict Schedule
Neither reads nor writes data until previous transaction commits/aborts.
w₁(A), commit₁, r₂(A), w₂(A), commit₂ ✅ StrictRelationship
Strict ⊂ Cascadeless ⊂ RecoverableSQL Transaction Control
COMMIT
BEGIN TRANSACTION;
UPDATE accounts
SET balance = balance - 1000
WHERE id = 1;
UPDATE accounts
SET balance = balance + 1000
WHERE id = 2;
COMMIT; -- Save all changesROLLBACK
BEGIN TRANSACTION;
UPDATE accounts
SET balance = balance - 1000
WHERE id = 1;
-- Error occurred!
ROLLBACK; -- Undo all changesSAVEPOINT
BEGIN TRANSACTION;
UPDATE accounts
SET balance = balance - 1000
WHERE id = 1;
SAVEPOINT sp1;
UPDATE accounts
SET balance = balance + 1000
WHERE id = 2;
SAVEPOINT sp2;
UPDATE accounts
SET balance = balance + 500
WHERE id = 3;
ROLLBACK TO sp1;
-- Undo changes after sp1
COMMIT;
-- Only first update is savedPractice Problems
Problem 1: Find order of B-tree. Key=8 bytes, Pointer=12 bytes, Block=512 bytes
Show Solution
8(p - 1) + 12p ≤ 512
8p - 8 + 12p ≤ 512
20p ≤ 520
p = 26
Order = 26Problem 2: Is schedule serializable? S: w₁(A), r₂(A), w₂(B), r₁(B)
Show Solution
Conflicts:
w₁(A) → r₂(A): T1 → T2
w₂(B) → r₁(B): T2 → T1
Graph: T1 ⇄ T2 (cycle!)
Result: NOT serializable ❌Problem 3: Calculate bitmap size. 500 rows, 4 distinct values
Show Solution
Size = (Rows × Distinct values) / 8
= (500 × 4) / 8
= 2000 / 8
= 250 bytesProblem 4: Is schedule recoverable? S: w₁(A), r₂(A), commit₂, commit₁
Show Solution
T2 reads from T1 (r₂(A) after w₁(A))
T2 commits BEFORE T1 commits
If T1 aborts after T2 commits, T2 read invalid data
Cannot undo T2 (already committed)
Result: NOT recoverable ❌Quick Formulas Reference
key_size × (p-1) + pointer_size × p ≤ block_sizeheight = ⌈logp(n)⌉size = (rows × distinct_values) / 8 bytesmax_keys = p - 1min_keys = ⌈(p-1)/2⌉min_resources = Σ(max_requirement) - (n - 1)Quick Tips
✓ Sparse index ONLY works on ordered files
✓ 2PL guarantees serializability but not deadlock-free
✓ Bitmap index best for low cardinality columns
✓ Strict ⊂ Cascadeless ⊂ Recoverable
✓ B⁺-tree: All data in leaf nodes, linked for range queries
✓ Timestamp ordering is deadlock-free
Almost There!
You've mastered indexing and transactions! One more part to go: Recovery, Query Optimization, and complete reference materials.

