System design interviews have evolved dramatically. In 2026, interviewers aren't just looking for a rough architecture sketch on a whiteboard — they want to see that you understand trade-offs, failure modes, and modern tooling at depth. Two pillars that appear in virtually every real-time system design question are caching and event streaming.
Whether you're designing a live leaderboard for a gaming platform, a real-time fraud detection engine, or a ride-sharing dispatch system — you'll need a deep, practical understanding of both. This guide walks you through everything from first principles to production-grade patterns, with code examples and interview-ready talking points throughout.
Who Is This For?
This blog is written for:
- Beginners entering their first system design interviews
- Intermediate engineers looking to sharpen their knowledge of distributed systems
- Senior engineers preparing for staff-level design rounds
No matter where you are in your journey, you'll walk away with actionable knowledge you can use immediately.
The Real-Time Systems Challenge
Before we dive into solutions, let's frame the problem.
A real-time system is one where data must be processed, transmitted, and acted upon with minimal latency — typically under a few hundred milliseconds. Examples include:
- Live sports scores updated as events happen
- Stock price tickers reflecting market changes instantly
- Chat applications delivering messages without perceptible delay
- Fraud detection flagging suspicious transactions before they clear
The core challenge is this: traditional request-response architectures fall apart under real-time constraints because they are synchronous, stateless per request, and not designed for fan-out (broadcasting one event to many consumers).
This is where caching and event streams come in.
Part 1: Caching in Real-Time Systems
What Is Caching?
Caching is the practice of storing frequently accessed or computationally expensive data in a fast-access layer so that future requests can be served faster — without hitting slower backing stores like databases or external APIs.
Think of it like this: your database is a library with millions of books. Your cache is the small shelf of books you keep on your desk. Fetching from your desk takes milliseconds; going to the library takes seconds.
Cache Placement Strategies
There are several places you can insert a cache in your architecture:
Client → [CDN / Edge Cache] → [API Gateway] → [App Server + In-Memory Cache] → [Distributed Cache (Redis)] → [Database]
| Layer | Technology | Use Case |
|---|---|---|
| CDN / Edge | Cloudflare, Fastly | Static assets, HTML, public API responses |
| API Gateway | Kong, AWS API GW | Rate limiting, response caching |
| In-process | Caffeine (Java), node-cache | Ultra-low latency, local hot data |
| Distributed | Redis, Dragonfly | Shared state across service instances |
| Database | Query cache, materialized views | Reduce expensive query repetition |
Cache Eviction Policies
When your cache is full, something has to go. Your eviction policy determines what:
- LRU (Least Recently Used) — Evict the item that hasn't been accessed in the longest time. Best for temporal locality patterns.
- LFU (Least Frequently Used) — Evict the least popular items. Better when popularity is stable.
- TTL (Time-To-Live) — Expire items after a fixed duration. Essential for data correctness in real-time systems.
- FIFO — Evict in insertion order. Simple but rarely optimal.
For real-time systems, TTL + LRU is the most common combination.
Cache Invalidation: The Hard Problem
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
Cache invalidation is what keeps engineers up at night. The three main strategies are:
1. Cache-Aside (Lazy Loading)
The application is responsible for loading data into the cache.
import redis
import json
cache = redis.Redis(host='localhost', port=6379, decode_responses=True)
def get_user_profile(user_id: str) -> dict:
cache_key = f"user:profile:{user_id}"
# 1. Try cache first
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
# 2. Cache miss — fetch from DB
profile = db.query("SELECT * FROM users WHERE id = %s", user_id)
# 3. Populate cache with TTL
cache.setex(cache_key, 300, json.dumps(profile)) # 5 min TTL
return profile
Pros: Only caches what's actually needed. Tolerates cache failures gracefully.
Cons: First request always hits the database (cold start). Risk of stale data between TTL windows.
2. Write-Through
Data is written to the cache and the database simultaneously.
def update_user_profile(user_id: str, data: dict) -> None:
cache_key = f"user:profile:{user_id}"
# Write to DB first
db.execute("UPDATE users SET ... WHERE id = %s", user_id)
# Immediately update cache
cache.setex(cache_key, 300, json.dumps(data))
Pros: Cache is always fresh. No stale reads after writes.
Cons: Higher write latency. Caches data that may never be read.
3. Write-Behind (Write-Back)
Writes go to the cache first; a background process flushes to the database asynchronously.
Pros: Very low write latency.
Cons: Risk of data loss if cache crashes before flush. Complex to implement correctly.
Handling Cache Failure Patterns
Real-time systems can suffer catastrophic failures if caching isn't designed defensively.
Cache Stampede (Thundering Herd)
When a hot cache key expires, thousands of requests simultaneously miss the cache and hammer the database.
Solution: Probabilistic early expiration + mutex locks
import threading
_locks = {}
def get_with_lock(cache_key: str, fetch_fn, ttl: int = 300):
value = cache.get(cache_key)
if value:
return json.loads(value)
# Only one thread should regenerate
lock = _locks.setdefault(cache_key, threading.Lock())
with lock:
# Double-check after acquiring lock
value = cache.get(cache_key)
if value:
return json.loads(value)
result = fetch_fn()
cache.setex(cache_key, ttl, json.dumps(result))
return result
Cache Penetration
Requests for data that doesn't exist bypass the cache entirely and flood the database.
Solution: Cache null results + Bloom filters
def get_product(product_id: str) -> dict | None:
cache_key = f"product:{product_id}"
cached = cache.get(cache_key)
if cached == "NULL":
return None # Cached miss — don't hit DB
if cached:
return json.loads(cached)
product = db.query("SELECT * FROM products WHERE id = %s", product_id)
if product is None:
cache.setex(cache_key, 60, "NULL") # Cache the miss for 1 min
return None
cache.setex(cache_key, 300, json.dumps(product))
return product
Cache Avalanche
Many cache keys expire at the same time, causing a wave of DB requests.
Solution: Jitter on TTL values
import random
def set_with_jitter(key: str, value, base_ttl: int = 300):
jitter = random.randint(0, 60) # Add 0–60s of randomness
cache.setex(key, base_ttl + jitter, json.dumps(value))
Part 2: Event Streaming with Kafka
What Is an Event Stream?
An event stream is an ordered, immutable log of events published by producers and consumed by one or more consumers — potentially at different rates and at different times.
Unlike a traditional message queue where messages are deleted after consumption, an event stream retains events for a configurable retention period. This enables:
- Replay: Re-process historical events
- Fan-out: Multiple consumers read the same events independently
- Decoupling: Producers don't need to know about consumers
Apache Kafka is the dominant event streaming platform in 2026, used by Uber, Netflix, LinkedIn, and thousands of others.
Core Kafka Concepts
┌─────────────┐
Producers ──────▶│ Topic │──────▶ Consumer Group A
│ Partition 0│──────▶ Consumer Group B
│ Partition 1│
│ Partition 2│
└─────────────┘
Kafka Broker
| Concept | Description |
|---|---|
| Topic | A named, ordered log of events (like a database table) |
| Partition | A topic is split into partitions for parallelism |
| Offset | The position of a message within a partition |
| Producer | Publishes events to a topic |
| Consumer | Reads events from a topic |
| Consumer Group | A set of consumers sharing the workload of a topic |
| Broker | A Kafka server that stores and serves partitions |
Producing Events
from confluent_kafka import Producer
import json
producer = Producer({'bootstrap.servers': 'kafka:9092'})
def publish_order_event(order: dict) -> None:
event = {
"event_type": "ORDER_PLACED",
"order_id": order["id"],
"user_id": order["user_id"],
"amount": order["amount"],
"timestamp": order["created_at"]
}
producer.produce(
topic='order-events',
key=str(order["user_id"]), # Partition by user for ordering
value=json.dumps(event).encode('utf-8'),
callback=delivery_report
)
producer.flush()
def delivery_report(err, msg):
if err:
print(f'Message delivery failed: {err}')
else:
print(f'Delivered to {msg.topic()} [{msg.partition()}] @ offset {msg.offset()}')
Consuming Events
from confluent_kafka import Consumer
import json
consumer = Consumer({
'bootstrap.servers': 'kafka:9092',
'group.id': 'fraud-detection-service',
'auto.offset.reset': 'earliest',
'enable.auto.commit': False # Manual commit for at-least-once guarantee
})
consumer.subscribe(['order-events'])
def process_events():
while True:
msg = consumer.poll(timeout=1.0)
if msg is None:
continue
if msg.error():
handle_kafka_error(msg.error())
continue
event = json.loads(msg.value().decode('utf-8'))
try:
run_fraud_check(event)
consumer.commit(msg) # Only commit after successful processing
except Exception as e:
log_and_alert(e, event)
# Do NOT commit — message will be reprocessed
Kafka Delivery Guarantees
One of the most important topics in any system design interview involving Kafka:
| Guarantee | Description | Use Case |
|---|---|---|
| At-most-once | Messages may be lost, never duplicated | Metrics, logging |
| At-least-once | Messages may be duplicated, never lost | Payments (with idempotency) |
| Exactly-once | No loss, no duplication | Financial transactions |
Exactly-once requires Kafka's transactional API and is more complex to implement:
producer = Producer({
'bootstrap.servers': 'kafka:9092',
'transactional.id': 'payment-producer-1',
'enable.idempotence': True
})
producer.init_transactions()
try:
producer.begin_transaction()
producer.produce('payment-events', key=payment_id, value=payload)
producer.commit_transaction()
except Exception as e:
producer.abort_transaction()
raise
Part 3: Combining Caching and Event Streams
Here's where it gets interesting. In practice, the most powerful real-time architectures combine both layers. Let's look at a concrete example.
Real-World Example: Live Leaderboard System
Requirements:
- 10 million active users in a gaming platform
- Scores update in real-time as players finish matches
- Leaderboard must reflect changes within 2 seconds
- Support querying top-100 and a user's rank
Architecture:
Game Server ──▶ Kafka (score-events) ──▶ Score Processor ──▶ Redis Sorted Set
▲
Leaderboard API
│
Clients
Step 1: Publish score events
# Game server publishes when a match ends
def on_match_complete(match_result: dict):
producer.produce(
topic='score-events',
key=match_result['player_id'],
value=json.dumps({
"player_id": match_result['player_id'],
"delta_score": match_result['score_gained'],
"match_id": match_result['match_id'],
"timestamp": match_result['ended_at']
})
)
Step 2: Score processor updates Redis
import redis
r = redis.Redis(host='redis', port=6379)
def handle_score_event(event: dict):
player_id = event['player_id']
delta = event['delta_score']
# Redis Sorted Set: ZINCRBY atomically increments a member's score
new_score = r.zincrby('leaderboard:global', delta, player_id)
# Invalidate any cached rank pages that may now be stale
r.delete('leaderboard:top100:cached')
print(f"Player {player_id} new score: {new_score}")
Step 3: Leaderboard API queries Redis
from fastapi import FastAPI
app = FastAPI()
@app.get("/leaderboard/top100")
def get_top_100():
cache_key = 'leaderboard:top100:cached'
# Check cache first (short TTL since scores change frequently)
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# ZREVRANGE returns members sorted highest to lowest
top_players = r.zrevrange('leaderboard:global', 0, 99, withscores=True)
result = [
{"rank": i + 1, "player_id": player.decode(), "score": int(score)}
for i, (player, score) in enumerate(top_players)
]
# Cache for 2 seconds — acceptable staleness for leaderboard
r.setex(cache_key, 2, json.dumps(result))
return result
@app.get("/leaderboard/rank/{player_id}")
def get_player_rank(player_id: str):
# ZREVRANK returns 0-indexed rank (highest score = rank 0)
rank = r.zrevrank('leaderboard:global', player_id)
score = r.zscore('leaderboard:global', player_id)
if rank is None:
return {"error": "Player not found"}
return {
"player_id": player_id,
"rank": rank + 1,
"score": int(score)
}
This architecture handles millions of updates per second because:
- Kafka decouples the game servers from score processing
- Redis Sorted Sets provide O(log N) rank lookups
- The short cache TTL on top-100 reduces Redis load while keeping data fresh
Part 4: Partitioning and Scaling
Kafka Partition Strategy
Partitioning is how Kafka achieves parallelism. The number of partitions limits your maximum consumer parallelism:
Topic: "order-events" with 6 partitions
Consumer Group A: 6 consumers → each reads 1 partition (max throughput)
Consumer Group B: 3 consumers → each reads 2 partitions
Consumer Group C: 8 consumers → 2 consumers are idle (wasteful)
Choosing a partition key matters enormously:
# ✅ Good: partition by user_id for user-level ordering
producer.produce('events', key=user_id, value=payload)
# ❌ Bad: partition by timestamp — hot partitions when traffic spikes
producer.produce('events', key=str(datetime.now()), value=payload)
# ❌ Bad: no key — round-robin loses ordering guarantees
producer.produce('events', value=payload)
Redis Cluster for Horizontal Scaling
A single Redis node maxes out around 100k ops/sec. For higher throughput, use Redis Cluster:
┌──────────────────────────────────┐
│ Redis Cluster │
│ │
│ Node 1: Slots 0–5460 │
│ Node 2: Slots 5461–10922 │
│ Node 3: Slots 10923–16383 │
│ (+ replica for each node) │
└──────────────────────────────────┘
Redis Cluster uses consistent hashing with 16,384 hash slots. Each key maps to a slot, which maps to a node. This enables:
- Horizontal scaling by adding nodes
- Automatic rebalancing
- High availability with replicas
from redis.cluster import RedisCluster
rc = RedisCluster(
startup_nodes=[
{"host": "redis-node-1", "port": 6379},
{"host": "redis-node-2", "port": 6379},
{"host": "redis-node-3", "port": 6379},
],
decode_responses=True
)
# Use {hashtag} to force keys to the same slot when needed
rc.set("{user:123}:profile", json.dumps(profile))
rc.set("{user:123}:preferences", json.dumps(prefs))
Part 5: Common Interview Questions & How to Answer Them
"How would you design a real-time notification system for 100 million users?"
Framework to use:
- Clarify requirements: Push vs pull? What types of notifications? Guaranteed delivery?
- Identify bottlenecks: Fan-out at scale, connection management, delivery acknowledgment
- Choose the right tools: Kafka for ingestion, Redis for session lookup, WebSockets or SSE for delivery
- Discuss trade-offs: WebSockets require persistent connections (stateful) vs SSE (easier to scale, unidirectional)
Sample answer structure:
Client ←── WebSocket Connection ←── WebSocket Server (scaled horizontally)
│
Redis Pub/Sub
(server-to-server fan-out)
│
Notification Processor (Kafka consumer)
│
Kafka Topic
(notification-events)
│
Producers
(social graph service, etc.)
"How do you handle backpressure in event-driven systems?"
Backpressure occurs when producers generate events faster than consumers can process them. Strategies:
- Kafka's consumer lag monitoring: Alert when lag exceeds threshold, scale consumers
- Rate limiting at the producer: Token bucket or leaky bucket algorithms
- Circuit breakers: Temporarily stop publishing when downstream is overwhelmed
- Dead Letter Queues (DLQ): Route unprocessable messages aside without blocking
Best Practices
Caching Best Practices
- Always set a TTL — never cache indefinitely in a real-time system
- Use cache-aside for reads, write-through for critical writes
- Monitor cache hit rate — below 80% usually signals a design problem
- Never store sensitive data (passwords, PII) in distributed caches without encryption
- Size your cache for your hot working set, not your entire dataset
- Use connection pooling — don't open a new Redis connection per request
Kafka Best Practices
- Choose partition count carefully — it's very difficult to change later
- Set
min.insync.replicas=2for production topics to prevent data loss - Use consumer groups for scaling, not multiple topics
- Implement idempotent consumers — assume at-least-once delivery
- Monitor consumer lag with tools like Burrow or Kafka UI
- Use schema registry (Confluent Schema Registry or AWS Glue) to manage Avro/Protobuf schemas
Common Mistakes to Avoid
Caching Mistakes
❌ Caching mutable, rapidly-changing data with long TTLs
Users see stale prices, scores, or statuses. Use short TTLs or event-driven invalidation.
❌ Not accounting for cache cold start
After a deploy or cache flush, your database gets hammered. Use cache warming scripts.
❌ Storing entire objects when only a field changes
Use Redis hashes (HSET, HGET) to update individual fields atomically.
❌ Ignoring cache memory limits
Caches with no eviction policy will OOM. Always configure maxmemory and maxmemory-policy.
Kafka Mistakes
❌ Using a single partition for ordering
One partition = one consumer = no parallelism. Design your key strategy thoughtfully.
❌ Committing offsets before processing
If the processor crashes after commit but before work completes, you'll lose events.
❌ Not handling consumer rebalances
During scaling, Kafka reassigns partitions. Implement on_revoke handlers to commit cleanly.
❌ Ignoring message size limits
Kafka's default max message size is 1MB. Sending large payloads causes broker errors.
🚀 Pro Tips
-
Use Redis Streams instead of Pub/Sub when you need persistence and consumer groups. Pub/Sub drops messages if no one is listening; Streams don't.
-
Combine Kafka + Redis for the CQRS pattern: Kafka handles the write side (command log), Redis handles the read side (materialized view). This gives you both durability and speed.
-
In interviews, always mention monitoring — Prometheus + Grafana for Redis metrics (
redis-exporter), Kafka consumer lag dashboards. Interviewers love candidates who think about observability. -
Dragonfly DB is emerging as a Redis-compatible alternative with significantly better multi-threaded performance. Worth mentioning as a modern alternative in senior-level interviews.
-
For exactly-once semantics across Kafka + database, use the Transactional Outbox Pattern: write your event to a database table in the same transaction as your business data, then a separate poller publishes it to Kafka. This eliminates the dual-write problem.
-
Compression matters at scale: Enable Kafka producer compression (
compression.type=snappy) to reduce network I/O and broker storage costs by 3–5x for JSON payloads. -
Benchmark your cache serialization: JSON is readable but slow. Consider MessagePack or Protocol Buffers for high-throughput caches — they can be 2–4x faster to serialize/deserialize.
📌 Key Takeaways
-
Caching and event streaming are complementary, not competing — use caching for low-latency reads, event streams for reliable, decoupled writes.
-
Cache invalidation is the hardest part — choose the right strategy (cache-aside, write-through, write-behind) based on your consistency requirements, and always add jitter to TTLs.
-
Kafka's delivery guarantees matter — understand at-most-once, at-least-once, and exactly-once, and design your consumers to be idempotent regardless.
-
Redis Sorted Sets are purpose-built for real-time leaderboards —
ZINCRBY,ZREVRANK, andZREVRANGEare your best friends. -
Partition strategy in Kafka determines your scalability ceiling — choose keys that distribute load evenly and preserve the ordering guarantees your business logic needs.
-
Design for failure: Cache stampedes, avalanches, and penetration are predictable — handle them proactively with locks, null caching, and TTL jitter.
-
Always discuss observability in interviews — consumer lag, cache hit rates, p99 latency, and error rates are what keep these systems healthy in production.
Conclusion
Designing scalable real-time systems is one of the most complex and rewarding challenges in backend engineering. In 2026, the bar for system design interviews has risen — but so have the tools available to us.
By mastering caching patterns (cache-aside, write-through, TTL jitter, stampede prevention) and event streaming fundamentals (Kafka partitioning, consumer groups, delivery guarantees), you arm yourself with the building blocks to tackle any real-time design problem thrown at you.
The key mindset shift is to think in trade-offs: consistency vs availability, throughput vs latency, simplicity vs fault-tolerance. Interviewers want to see that thinking out loud, backed by concrete knowledge of how these systems behave under load.
Go build something. Set up a local Kafka cluster with Docker Compose. Write a leaderboard with Redis Sorted Sets. Break it. Fix it. That hands-on intuition is what separates candidates who've memorized architectures from engineers who can design them.
Good luck — and happy designing. 🎯