DBMS Part 4: Indexing, Hashing & Transactions

Indexing & Hashing

1. Index Types
2. Dense vs Sparse
3. Bitmap Index
4. B-Trees & B⁺-Trees
5. Hashing

Transactions

6. ACID Properties
7. Serializability
8. Two-Phase Locking
9. Deadlock
10. Schedule Types

Indexing & Hashing

1. Index Types

Index Type	Key Order	Data Order	Dense/Sparse
Primary	Ordered	Ordered (same)	Can be both
Clustering	Ordered	Ordered (different key)	Dense
Secondary	Ordered	Unordered	Always dense

Key Rules

• Primary Index: Search key = Ordering key of file
• Secondary Index: Search key ≠ Ordering key
• Dense Index: Entry for every search key
• Sparse Index: Entry for some keys (only for ordered files)

2. Dense vs Sparse Index

Dense Index

Index:  [10] [20] [30] [40] [50]
         ↓    ↓    ↓    ↓    ↓
Data:   [10] [20] [30] [40] [50]

• Entry for every record
• Faster search
• More space required
• Works on any file

Sparse Index

Index:  [10]      [30]      [50]
         ↓         ↓         ↓
Data:   [10][20] [30][40] [50][60]

• Entry for some records
• Slower search
• Less space required
• Only for ordered files

3. Bitmap Index

Formula

Size (bytes) = (Number of rows × Number of distinct values) / 8

Example Calculation

Given:

• Attribute: Grade (values: A, B, C, D, F)
• Distinct values = 5
• Bitmap size = 200 bytes

Find: Number of rows

Rows = (Size × 8) / Distinct values

= (200 × 8) / 5

= 320 rows

When to Use Bitmap Index

• Low cardinality (few distinct values)
• Read-heavy workloads
• Data warehousing
• Gender, Status, Category columns

SQL Syntax

CREATE BITMAP INDEX index_name 
ON table_name(column_name);

-- Example
CREATE BITMAP INDEX team_name 
ON Player(team);

4. B-Trees & B⁺-Trees

Order Formula

key_size × (p - 1) + pointer_size × p ≤ block_size

Solve for p (order)

Example: Finding Order

Given:

• Key size = 8 bytes
• Pointer size = 12 bytes
• Block size = 512 bytes

Solution:

8(p - 1) + 12p ≤ 512

8p - 8 + 12p ≤ 512

20p ≤ 520

p = 26

B-Tree Properties (Order p)

Property	Formula	Example (p=17)
Max keys per node	p - 1	16
Max children per node	p	17
Min keys (non-root)	⌈(p-1)/2⌉	8
Min children (non-root)	⌈p/2⌉	9

Height Calculation

Height = ⌈log_p(N)⌉

where N = number of records, p = order

Example:

Records = 6,250,000, Order = 50

Height = ⌈log₅₀(6,250,000)⌉ = 4

Max nodes to access = 4

B⁺-Tree vs B-Tree

• B⁺-Tree: All data in leaf nodes, internal nodes for navigation only
• B-Tree: Data in all nodes
• B⁺-Tree: Leaf nodes are linked (good for range queries)
• Same formulas apply for both

5. Hashing

Hash Function Properties

• Maps key → bucket number
• Should distribute evenly
• Minimize collisions
• Deterministic (same input → same output)

Example Hash Functions

h₁(n) = n mod 7

Simple modulo operation

h₂(n) = n² + 1 mod 7

Quadratic hashing

h₃(n) = 3n + 3 mod 7

Linear hashing

Extendable Hashing

• Global Depth (d): Bits used from hash value
• Local Depth: Bits for specific bucket
• Bucket Split: When bucket overflows, increase depth

Example:

Hash: 1001 0110 (binary)

Global depth: 2

Use first 2 bits: 1001 0110

→ Bucket 2

Transactions

6. ACID Properties

A - Atomicity

All or nothing. Either all operations complete or none do.

Transfer $100: Debit AND Credit both execute, or none

C - Consistency

Database must be consistent before and after transaction. May be inconsistent during execution.

Balance ≥ 0 before transaction AND after transaction

I - Isolation

Concurrent transactions appear to execute sequentially. T1 doesn't see T2's partial changes.

Two users withdrawing from same account don't interfere

D - Durability

Once committed, changes are permanent. Survives system crashes.

Power failure after COMMIT won't undo the transaction

7. Serializability

Conflict Serializability

Method: Draw precedence graph and check for cycles

Steps:

1. Nodes = Transactions
2. Add edge T_i → T_j if they have conflicting operations where T_i comes first
3. If cycle exists → NOT serializable
4. No cycle → Serializable

Conflicting Operations

Operation 1	Operation 2	Conflict?
r₁(A)	w₂(A)	✗ Yes
w₁(A)	r₂(A)	✗ Yes
w₁(A)	w₂(A)	✗ Yes
r₁(A)	r₂(A)	✓ No conflict

Example

Schedule S:

w₁(A), r₂(A), w₂(B), r₁(B)

Conflicts:

w₁(A) → r₂(A): T1 → T2

w₂(B) → r₁(B): T2 → T1

Graph:

T1 ⇄ T2 (cycle!)

Result: NOT serializable

8. Two-Phase Locking (2PL)

Two Phases

1. Growing Phase

Acquire locks only. Cannot release any lock.

2. Shrinking Phase

Release locks only. Cannot acquire new locks.

Lock Types

Lock	Purpose	Compatible with S?	Compatible with X?
Shared (S)	For reading	✓ Yes	✗ No
Exclusive (X)	For writing	✗ No	✗ No

Key Points

• ✅ Guarantees conflict serializability
• ❌ Does NOT prevent deadlock
• Strict 2PL: Release all locks at commit/abort
• Rigorous 2PL: Release all locks at end only

9. Deadlock

Detection: Wait-For Graph

Draw graph where edge T_i → T_j means T_i waits for T_j

T1 waits for T2

T2 waits for T3

T3 waits for T1

Graph: T1 → T2 → T3 → T1 (cycle!)

→ Deadlock detected

Prevention Methods

Wait-Die

Older transaction waits, younger aborts

Wound-Wait

Older forces younger to abort

Timeout

Abort if waiting too long

Deadlock-Free Protocols

✅ Timestamp Ordering Protocol
✅ Tree Protocol
❌ 2PL (can have deadlock)

10. Schedule Types

Type	Read Uncommitted?	Write Uncommitted?	Cascading Rollback?
Recoverable	Yes	Yes	Yes
Cascadeless	Cascadeless	No	Yes	No
Strict	No	No	No

Recoverable Schedule

If T_i reads from T_j, then T_j must commit before T_i commits.

w₁(A), r₂(A), commit₁, commit₂ ✅ Recoverable

Cascadeless Schedule

Transaction reads only committed data. No cascading rollbacks.

w₁(A), commit₁, r₂(A), commit₂ ✅ Cascadeless

Strict Schedule

Neither reads nor writes data until previous transaction commits/aborts.

w₁(A), commit₁, r₂(A), w₂(A), commit₂ ✅ Strict

Relationship

Strict ⊂ Cascadeless ⊂ Recoverable

SQL Transaction Control

COMMIT

BEGIN TRANSACTION;
UPDATE accounts 
SET balance = balance - 1000 
WHERE id = 1;

UPDATE accounts 
SET balance = balance + 1000 
WHERE id = 2;

COMMIT; -- Save all changes

ROLLBACK

BEGIN TRANSACTION;
UPDATE accounts 
SET balance = balance - 1000 
WHERE id = 1;

-- Error occurred!
ROLLBACK; -- Undo all changes

SAVEPOINT

BEGIN TRANSACTION;

UPDATE accounts 
SET balance = balance - 1000 
WHERE id = 1;

SAVEPOINT sp1;

UPDATE accounts 
SET balance = balance + 1000 
WHERE id = 2;

SAVEPOINT sp2;

UPDATE accounts 
SET balance = balance + 500 
WHERE id = 3;

ROLLBACK TO sp1; 
-- Undo changes after sp1

COMMIT; 
-- Only first update is saved

Practice Problems

Problem 1: Find order of B-tree. Key=8 bytes, Pointer=12 bytes, Block=512 bytes

Show Solution

8(p - 1) + 12p ≤ 512
8p - 8 + 12p ≤ 512
20p ≤ 520
p = 26

Order = 26

Problem 2: Is schedule serializable? S: w₁(A), r₂(A), w₂(B), r₁(B)

Show Solution

Conflicts:
w₁(A) → r₂(A): T1 → T2
w₂(B) → r₁(B): T2 → T1

Graph: T1 ⇄ T2 (cycle!)

Result: NOT serializable ❌

Problem 3: Calculate bitmap size. 500 rows, 4 distinct values

Show Solution

Size = (Rows × Distinct values) / 8
     = (500 × 4) / 8
     = 2000 / 8
     = 250 bytes

Problem 4: Is schedule recoverable? S: w₁(A), r₂(A), commit₂, commit₁

Show Solution

T2 reads from T1 (r₂(A) after w₁(A))
T2 commits BEFORE T1 commits

If T1 aborts after T2 commits, T2 read invalid data
Cannot undo T2 (already committed)

Result: NOT recoverable ❌

Quick Formulas Reference

B-Tree Order

key_size × (p-1) + pointer_size × p ≤ block_size

B-Tree Height

height = ⌈log_p(n)⌉

Bitmap Size

size = (rows × distinct_values) / 8 bytes

Max Keys per Node

max_keys = p - 1

Min Keys per Node

min_keys = ⌈(p-1)/2⌉

Deadlock-Free Resource Allocation

min_resources = Σ(max_requirement) - (n - 1)

Quick Tips

✓ Sparse index ONLY works on ordered files

✓ 2PL guarantees serializability but not deadlock-free

✓ Bitmap index best for low cardinality columns

✓ Strict ⊂ Cascadeless ⊂ Recoverable

✓ B⁺-tree: All data in leaf nodes, linked for range queries

✓ Timestamp ordering is deadlock-free

Almost There!

You've mastered indexing and transactions! One more part to go: Recovery, Query Optimization, and complete reference materials.

Previous: Part 3 Final Part: Recovery & Optimization

Indexing & Transactions

Contents

Indexing & Hashing

1. Index Types

Key Rules

2. Dense vs Sparse Index

Dense Index

Sparse Index

3. Bitmap Index

Formula

Example Calculation

When to Use Bitmap Index

SQL Syntax

4. B-Trees & B⁺-Trees

Order Formula

Example: Finding Order

B-Tree Properties (Order p)

Height Calculation

B⁺-Tree vs B-Tree

5. Hashing

Hash Function Properties

Example Hash Functions

Extendable Hashing

Transactions

6. ACID Properties

A - Atomicity

C - Consistency

I - Isolation

D - Durability

7. Serializability

Conflict Serializability

Conflicting Operations

Example

8. Two-Phase Locking (2PL)

Two Phases

Lock Types

Key Points

9. Deadlock

Detection: Wait-For Graph

Prevention Methods

Deadlock-Free Protocols

10. Schedule Types

Recoverable Schedule

Cascadeless Schedule

Strict Schedule

Relationship

SQL Transaction Control

COMMIT

ROLLBACK

SAVEPOINT

Practice Problems

Quick Formulas Reference

Quick Tips

Almost There!