Denormalization

📦 What is Denormalization?

Denormalization is the process of intentionally introducing redundancy into a database by combining or duplicating data to improve read performance, even if it breaks strict normalization rules.

Think of it as the opposite of normalization, where instead of optimizing for data integrity and minimal redundancy, you optimize for speed and simplicity — often at the cost of consistency and storage.

🧠 Why Denormalize?

To reduce the number of joins
To improve read speed
To simplify query logic
To better support reporting, analytics, caching

🔄 Normalization vs Denormalization

Feature	Normalization	Denormalization
Goal	Remove redundancy, ensure consistency	Optimize performance, reduce joins
Data Structure	More tables with relations	Fewer tables, more redundancy
Reads	Slower (many joins)	Faster (fewer joins)
Writes	Simpler	More complex (need to update in many places)
Storage	Efficient	Requires more space

🧱 Example

Normalized:

users (user_id, name)
orders (order_id, user_id, amount)

To get user name for an order:

SELECT o.order_id, u.name, o.amount 
FROM orders o 
JOIN users u ON o.user_id = u.user_id;

Denormalized:

orders (order_id, user_id, user_name, amount)

Now the query is:

SELECT order_id, user_name, amount FROM orders;

✅ Faster
❌ But if user name changes, you need to update every order too

🔧 Common Denormalization Techniques

Technique	Example
Duplicate columns	Store `user_name` in `orders`
Precompute aggregates	Save total sales in a product record
Embed nested documents (NoSQL)	MongoDB: store user info inside order document
Materialized views	Store query result as a table
Star schema	In data warehousing — facts + denormalized dimensions

✅ When to Use Denormalization

Use it when:

You have read-heavy workloads
You’re building analytics or dashboards
You’re working in a NoSQL or document-based system
Joins are expensive (esp. on large datasets)
You control data writes carefully (to avoid inconsistencies)

⚠️ Risks and Trade-offs

Risk	Description
❌ Inconsistency	One field changes, but copies don’t get updated
🧩 Complex writes	Need to update multiple places atomically
💾 Storage cost	More disk usage due to duplication
🔄 Data drift	If sync fails between original and duplicate data

💬 Summary

Denormalization is a performance optimization technique where you duplicate or pre-join data to make reads faster — especially in read-heavy systems, NoSQL setups, or analytics platforms.

Gaurav’s Notes

Explorer