System Design Interview 2026: Designing Scalable Real-Time Systems with Caching and Event Streams

System design interviews have evolved dramatically. In 2026, interviewers aren't just looking for a rough architecture sketch on a whiteboard — they want to see that you understand trade-offs, failure modes, and modern tooling at depth. Two pillars that appear in virtually every real-time system design question are caching and event streaming.

Whether you're designing a live leaderboard for a gaming platform, a real-time fraud detection engine, or a ride-sharing dispatch system — you'll need a deep, practical understanding of both. This guide walks you through everything from first principles to production-grade patterns, with code examples and interview-ready talking points throughout.

Who Is This For?

This blog is written for:

Beginners entering their first system design interviews
Intermediate engineers looking to sharpen their knowledge of distributed systems
Senior engineers preparing for staff-level design rounds

No matter where you are in your journey, you'll walk away with actionable knowledge you can use immediately.

The Real-Time Systems Challenge

Before we dive into solutions, let's frame the problem.

A real-time system is one where data must be processed, transmitted, and acted upon with minimal latency — typically under a few hundred milliseconds. Examples include:

Live sports scores updated as events happen
Stock price tickers reflecting market changes instantly
Chat applications delivering messages without perceptible delay
Fraud detection flagging suspicious transactions before they clear

The core challenge is this: traditional request-response architectures fall apart under real-time constraints because they are synchronous, stateless per request, and not designed for fan-out (broadcasting one event to many consumers).

This is where caching and event streams come in.

Part 1: Caching in Real-Time Systems

What Is Caching?

Caching is the practice of storing frequently accessed or computationally expensive data in a fast-access layer so that future requests can be served faster — without hitting slower backing stores like databases or external APIs.

Think of it like this: your database is a library with millions of books. Your cache is the small shelf of books you keep on your desk. Fetching from your desk takes milliseconds; going to the library takes seconds.

Cache Placement Strategies

There are several places you can insert a cache in your architecture:

Client → [CDN / Edge Cache] → [API Gateway] → [App Server + In-Memory Cache] → [Distributed Cache (Redis)] → [Database]

Layer	Technology	Use Case
CDN / Edge	Cloudflare, Fastly	Static assets, HTML, public API responses
API Gateway	Kong, AWS API GW	Rate limiting, response caching
In-process	Caffeine (Java), node-cache	Ultra-low latency, local hot data
Distributed	Redis, Dragonfly	Shared state across service instances
Database	Query cache, materialized views	Reduce expensive query repetition

Cache Eviction Policies

When your cache is full, something has to go. Your eviction policy determines what:

LRU (Least Recently Used) — Evict the item that hasn't been accessed in the longest time. Best for temporal locality patterns.
LFU (Least Frequently Used) — Evict the least popular items. Better when popularity is stable.
TTL (Time-To-Live) — Expire items after a fixed duration. Essential for data correctness in real-time systems.
FIFO — Evict in insertion order. Simple but rarely optimal.

For real-time systems, TTL + LRU is the most common combination.

Cache Invalidation: The Hard Problem

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Cache invalidation is what keeps engineers up at night. The three main strategies are:

1. Cache-Aside (Lazy Loading)

The application is responsible for loading data into the cache.

import redis
import json

cache = redis.Redis(host='localhost', port=6379, decode_responses=True)

def get_user_profile(user_id: str) -> dict:
    cache_key = f"user:profile:{user_id}"
    
    # 1. Try cache first
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # 2. Cache miss — fetch from DB
    profile = db.query("SELECT * FROM users WHERE id = %s", user_id)
    
    # 3. Populate cache with TTL
    cache.setex(cache_key, 300, json.dumps(profile))  # 5 min TTL
    
    return profile

Pros: Only caches what's actually needed. Tolerates cache failures gracefully.
Cons: First request always hits the database (cold start). Risk of stale data between TTL windows.

2. Write-Through

Data is written to the cache and the database simultaneously.

def update_user_profile(user_id: str, data: dict) -> None:
    cache_key = f"user:profile:{user_id}"
    
    # Write to DB first
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)
    
    # Immediately update cache
    cache.setex(cache_key, 300, json.dumps(data))

Pros: Cache is always fresh. No stale reads after writes.
Cons: Higher write latency. Caches data that may never be read.

3. Write-Behind (Write-Back)

Writes go to the cache first; a background process flushes to the database asynchronously.

Pros: Very low write latency.
Cons: Risk of data loss if cache crashes before flush. Complex to implement correctly.

Handling Cache Failure Patterns

Real-time systems can suffer catastrophic failures if caching isn't designed defensively.

Cache Stampede (Thundering Herd)

When a hot cache key expires, thousands of requests simultaneously miss the cache and hammer the database.

Solution: Probabilistic early expiration + mutex locks

import threading

_locks = {}

def get_with_lock(cache_key: str, fetch_fn, ttl: int = 300):
    value = cache.get(cache_key)
    if value:
        return json.loads(value)
    
    # Only one thread should regenerate
    lock = _locks.setdefault(cache_key, threading.Lock())
    with lock:
        # Double-check after acquiring lock
        value = cache.get(cache_key)
        if value:
            return json.loads(value)
        
        result = fetch_fn()
        cache.setex(cache_key, ttl, json.dumps(result))
        return result

Cache Penetration

Requests for data that doesn't exist bypass the cache entirely and flood the database.

Solution: Cache null results + Bloom filters

def get_product(product_id: str) -> dict | None:
    cache_key = f"product:{product_id}"
    
    cached = cache.get(cache_key)
    if cached == "NULL":
        return None  # Cached miss — don't hit DB
    if cached:
        return json.loads(cached)
    
    product = db.query("SELECT * FROM products WHERE id = %s", product_id)
    
    if product is None:
        cache.setex(cache_key, 60, "NULL")  # Cache the miss for 1 min
        return None
    
    cache.setex(cache_key, 300, json.dumps(product))
    return product

Cache Avalanche

Many cache keys expire at the same time, causing a wave of DB requests.

Solution: Jitter on TTL values

import random

def set_with_jitter(key: str, value, base_ttl: int = 300):
    jitter = random.randint(0, 60)  # Add 0–60s of randomness
    cache.setex(key, base_ttl + jitter, json.dumps(value))

Part 2: Event Streaming with Kafka

What Is an Event Stream?

An event stream is an ordered, immutable log of events published by producers and consumed by one or more consumers — potentially at different rates and at different times.

Unlike a traditional message queue where messages are deleted after consumption, an event stream retains events for a configurable retention period. This enables:

Replay: Re-process historical events
Fan-out: Multiple consumers read the same events independently
Decoupling: Producers don't need to know about consumers

Apache Kafka is the dominant event streaming platform in 2026, used by Uber, Netflix, LinkedIn, and thousands of others.

Core Kafka Concepts

                    ┌─────────────┐
   Producers ──────▶│   Topic     │──────▶ Consumer Group A
                    │  Partition 0│──────▶ Consumer Group B
                    │  Partition 1│
                    │  Partition 2│
                    └─────────────┘
                      Kafka Broker

Concept	Description
Topic	A named, ordered log of events (like a database table)
Partition	A topic is split into partitions for parallelism
Offset	The position of a message within a partition
Producer	Publishes events to a topic
Consumer	Reads events from a topic
Consumer Group	A set of consumers sharing the workload of a topic
Broker	A Kafka server that stores and serves partitions

Producing Events

from confluent_kafka import Producer
import json

producer = Producer({'bootstrap.servers': 'kafka:9092'})

def publish_order_event(order: dict) -> None:
    event = {
        "event_type": "ORDER_PLACED",
        "order_id": order["id"],
        "user_id": order["user_id"],
        "amount": order["amount"],
        "timestamp": order["created_at"]
    }
    
    producer.produce(
        topic='order-events',
        key=str(order["user_id"]),   # Partition by user for ordering
        value=json.dumps(event).encode('utf-8'),
        callback=delivery_report
    )
    producer.flush()

def delivery_report(err, msg):
    if err:
        print(f'Message delivery failed: {err}')
    else:
        print(f'Delivered to {msg.topic()} [{msg.partition()}] @ offset {msg.offset()}')

Consuming Events

from confluent_kafka import Consumer
import json

consumer = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'fraud-detection-service',
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': False  # Manual commit for at-least-once guarantee
})

consumer.subscribe(['order-events'])

def process_events():
    while True:
        msg = consumer.poll(timeout=1.0)
        if msg is None:
            continue
        if msg.error():
            handle_kafka_error(msg.error())
            continue
        
        event = json.loads(msg.value().decode('utf-8'))
        
        try:
            run_fraud_check(event)
            consumer.commit(msg)  # Only commit after successful processing
        except Exception as e:
            log_and_alert(e, event)
            # Do NOT commit — message will be reprocessed

Kafka Delivery Guarantees

One of the most important topics in any system design interview involving Kafka:

Guarantee	Description	Use Case
At-most-once	Messages may be lost, never duplicated	Metrics, logging
At-least-once	Messages may be duplicated, never lost	Payments (with idempotency)
Exactly-once	No loss, no duplication	Financial transactions

Exactly-once requires Kafka's transactional API and is more complex to implement:

producer = Producer({
    'bootstrap.servers': 'kafka:9092',
    'transactional.id': 'payment-producer-1',
    'enable.idempotence': True
})

producer.init_transactions()

try:
    producer.begin_transaction()
    producer.produce('payment-events', key=payment_id, value=payload)
    producer.commit_transaction()
except Exception as e:
    producer.abort_transaction()
    raise

Part 3: Combining Caching and Event Streams

Here's where it gets interesting. In practice, the most powerful real-time architectures combine both layers. Let's look at a concrete example.

Real-World Example: Live Leaderboard System

Requirements:

10 million active users in a gaming platform
Scores update in real-time as players finish matches
Leaderboard must reflect changes within 2 seconds
Support querying top-100 and a user's rank

Architecture:

Game Server ──▶ Kafka (score-events) ──▶ Score Processor ──▶ Redis Sorted Set
                                                              ▲
                                                    Leaderboard API
                                                              │
                                                          Clients

Step 1: Publish score events

# Game server publishes when a match ends
def on_match_complete(match_result: dict):
    producer.produce(
        topic='score-events',
        key=match_result['player_id'],
        value=json.dumps({
            "player_id": match_result['player_id'],
            "delta_score": match_result['score_gained'],
            "match_id": match_result['match_id'],
            "timestamp": match_result['ended_at']
        })
    )

Step 2: Score processor updates Redis

import redis

r = redis.Redis(host='redis', port=6379)

def handle_score_event(event: dict):
    player_id = event['player_id']
    delta = event['delta_score']
    
    # Redis Sorted Set: ZINCRBY atomically increments a member's score
    new_score = r.zincrby('leaderboard:global', delta, player_id)
    
    # Invalidate any cached rank pages that may now be stale
    r.delete('leaderboard:top100:cached')
    
    print(f"Player {player_id} new score: {new_score}")

Step 3: Leaderboard API queries Redis

from fastapi import FastAPI

app = FastAPI()

@app.get("/leaderboard/top100")
def get_top_100():
    cache_key = 'leaderboard:top100:cached'
    
    # Check cache first (short TTL since scores change frequently)
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # ZREVRANGE returns members sorted highest to lowest
    top_players = r.zrevrange('leaderboard:global', 0, 99, withscores=True)
    
    result = [
        {"rank": i + 1, "player_id": player.decode(), "score": int(score)}
        for i, (player, score) in enumerate(top_players)
    ]
    
    # Cache for 2 seconds — acceptable staleness for leaderboard
    r.setex(cache_key, 2, json.dumps(result))
    return result

@app.get("/leaderboard/rank/{player_id}")
def get_player_rank(player_id: str):
    # ZREVRANK returns 0-indexed rank (highest score = rank 0)
    rank = r.zrevrank('leaderboard:global', player_id)
    score = r.zscore('leaderboard:global', player_id)
    
    if rank is None:
        return {"error": "Player not found"}
    
    return {
        "player_id": player_id,
        "rank": rank + 1,
        "score": int(score)
    }

This architecture handles millions of updates per second because:

Kafka decouples the game servers from score processing
Redis Sorted Sets provide O(log N) rank lookups
The short cache TTL on top-100 reduces Redis load while keeping data fresh

Part 4: Partitioning and Scaling

Kafka Partition Strategy

Partitioning is how Kafka achieves parallelism. The number of partitions limits your maximum consumer parallelism:

Topic: "order-events" with 6 partitions
Consumer Group A: 6 consumers → each reads 1 partition (max throughput)
Consumer Group B: 3 consumers → each reads 2 partitions
Consumer Group C: 8 consumers → 2 consumers are idle (wasteful)

Choosing a partition key matters enormously:

# ✅ Good: partition by user_id for user-level ordering
producer.produce('events', key=user_id, value=payload)

# ❌ Bad: partition by timestamp — hot partitions when traffic spikes
producer.produce('events', key=str(datetime.now()), value=payload)

# ❌ Bad: no key — round-robin loses ordering guarantees
producer.produce('events', value=payload)

Redis Cluster for Horizontal Scaling

A single Redis node maxes out around 100k ops/sec. For higher throughput, use Redis Cluster:

          ┌──────────────────────────────────┐
          │         Redis Cluster            │
          │                                  │
          │  Node 1: Slots 0–5460            │
          │  Node 2: Slots 5461–10922        │
          │  Node 3: Slots 10923–16383       │
          │  (+ replica for each node)       │
          └──────────────────────────────────┘

Redis Cluster uses consistent hashing with 16,384 hash slots. Each key maps to a slot, which maps to a node. This enables:

Horizontal scaling by adding nodes
Automatic rebalancing
High availability with replicas

from redis.cluster import RedisCluster

rc = RedisCluster(
    startup_nodes=[
        {"host": "redis-node-1", "port": 6379},
        {"host": "redis-node-2", "port": 6379},
        {"host": "redis-node-3", "port": 6379},
    ],
    decode_responses=True
)

# Use {hashtag} to force keys to the same slot when needed
rc.set("{user:123}:profile", json.dumps(profile))
rc.set("{user:123}:preferences", json.dumps(prefs))

Part 5: Common Interview Questions & How to Answer Them

"How would you design a real-time notification system for 100 million users?"

Framework to use:

Clarify requirements: Push vs pull? What types of notifications? Guaranteed delivery?
Identify bottlenecks: Fan-out at scale, connection management, delivery acknowledgment
Choose the right tools: Kafka for ingestion, Redis for session lookup, WebSockets or SSE for delivery
Discuss trade-offs: WebSockets require persistent connections (stateful) vs SSE (easier to scale, unidirectional)

Sample answer structure:

Client ←── WebSocket Connection ←── WebSocket Server (scaled horizontally)
                                           │
                                     Redis Pub/Sub
                                     (server-to-server fan-out)
                                           │
                              Notification Processor (Kafka consumer)
                                           │
                                    Kafka Topic
                                    (notification-events)
                                           │
                                     Producers
                              (social graph service, etc.)

"How do you handle backpressure in event-driven systems?"

Backpressure occurs when producers generate events faster than consumers can process them. Strategies:

Kafka's consumer lag monitoring: Alert when lag exceeds threshold, scale consumers
Rate limiting at the producer: Token bucket or leaky bucket algorithms
Circuit breakers: Temporarily stop publishing when downstream is overwhelmed
Dead Letter Queues (DLQ): Route unprocessable messages aside without blocking

Best Practices

Caching Best Practices

Always set a TTL — never cache indefinitely in a real-time system
Use cache-aside for reads, write-through for critical writes
Monitor cache hit rate — below 80% usually signals a design problem
Never store sensitive data (passwords, PII) in distributed caches without encryption
Size your cache for your hot working set, not your entire dataset
Use connection pooling — don't open a new Redis connection per request

Kafka Best Practices

Choose partition count carefully — it's very difficult to change later
Set min.insync.replicas=2 for production topics to prevent data loss
Use consumer groups for scaling, not multiple topics
Implement idempotent consumers — assume at-least-once delivery
Monitor consumer lag with tools like Burrow or Kafka UI
Use schema registry (Confluent Schema Registry or AWS Glue) to manage Avro/Protobuf schemas

Common Mistakes to Avoid

Caching Mistakes

❌ Caching mutable, rapidly-changing data with long TTLs
Users see stale prices, scores, or statuses. Use short TTLs or event-driven invalidation.

❌ Not accounting for cache cold start
After a deploy or cache flush, your database gets hammered. Use cache warming scripts.

❌ Storing entire objects when only a field changes
Use Redis hashes (HSET, HGET) to update individual fields atomically.

❌ Ignoring cache memory limits
Caches with no eviction policy will OOM. Always configure maxmemory and maxmemory-policy.

Kafka Mistakes

❌ Using a single partition for ordering
One partition = one consumer = no parallelism. Design your key strategy thoughtfully.

❌ Committing offsets before processing
If the processor crashes after commit but before work completes, you'll lose events.

❌ Not handling consumer rebalances
During scaling, Kafka reassigns partitions. Implement on_revoke handlers to commit cleanly.

❌ Ignoring message size limits
Kafka's default max message size is 1MB. Sending large payloads causes broker errors.

🚀 Pro Tips

Use Redis Streams instead of Pub/Sub when you need persistence and consumer groups. Pub/Sub drops messages if no one is listening; Streams don't.
Combine Kafka + Redis for the CQRS pattern: Kafka handles the write side (command log), Redis handles the read side (materialized view). This gives you both durability and speed.
In interviews, always mention monitoring — Prometheus + Grafana for Redis metrics (redis-exporter), Kafka consumer lag dashboards. Interviewers love candidates who think about observability.
Dragonfly DB is emerging as a Redis-compatible alternative with significantly better multi-threaded performance. Worth mentioning as a modern alternative in senior-level interviews.
For exactly-once semantics across Kafka + database, use the Transactional Outbox Pattern: write your event to a database table in the same transaction as your business data, then a separate poller publishes it to Kafka. This eliminates the dual-write problem.
Compression matters at scale: Enable Kafka producer compression (compression.type=snappy) to reduce network I/O and broker storage costs by 3–5x for JSON payloads.
Benchmark your cache serialization: JSON is readable but slow. Consider MessagePack or Protocol Buffers for high-throughput caches — they can be 2–4x faster to serialize/deserialize.

📌 Key Takeaways

Caching and event streaming are complementary, not competing — use caching for low-latency reads, event streams for reliable, decoupled writes.
Cache invalidation is the hardest part — choose the right strategy (cache-aside, write-through, write-behind) based on your consistency requirements, and always add jitter to TTLs.
Kafka's delivery guarantees matter — understand at-most-once, at-least-once, and exactly-once, and design your consumers to be idempotent regardless.
Redis Sorted Sets are purpose-built for real-time leaderboards — ZINCRBY, ZREVRANK, and ZREVRANGE are your best friends.
Partition strategy in Kafka determines your scalability ceiling — choose keys that distribute load evenly and preserve the ordering guarantees your business logic needs.
Design for failure: Cache stampedes, avalanches, and penetration are predictable — handle them proactively with locks, null caching, and TTL jitter.
Always discuss observability in interviews — consumer lag, cache hit rates, p99 latency, and error rates are what keep these systems healthy in production.

Conclusion

Designing scalable real-time systems is one of the most complex and rewarding challenges in backend engineering. In 2026, the bar for system design interviews has risen — but so have the tools available to us.

By mastering caching patterns (cache-aside, write-through, TTL jitter, stampede prevention) and event streaming fundamentals (Kafka partitioning, consumer groups, delivery guarantees), you arm yourself with the building blocks to tackle any real-time design problem thrown at you.

The key mindset shift is to think in trade-offs: consistency vs availability, throughput vs latency, simplicity vs fault-tolerance. Interviewers want to see that thinking out loud, backed by concrete knowledge of how these systems behave under load.

Go build something. Set up a local Kafka cluster with Docker Compose. Write a leaderboard with Redis Sorted Sets. Break it. Fix it. That hands-on intuition is what separates candidates who've memorized architectures from engineers who can design them.

Good luck — and happy designing. 🎯