Skip to main content
Back to Blog
System DesignReal-Time SystemsCachingEvent StreamingKafkaRedisScalabilityDistributed SystemsInterview PrepBackend Engineering

System Design Interview 2026: Designing Scalable Real-Time Systems with Caching and Event Streams

Master system design interviews in 2026 by learning how to architect scalable real-time systems using caching strategies, event streaming with Kafka, and modern distributed patterns. Practical examples, code snippets, and interview-ready insights included.

April 1, 202616 min readNiraj Kumar

System design interviews have evolved dramatically. In 2026, interviewers aren't just looking for a rough architecture sketch on a whiteboard — they want to see that you understand trade-offs, failure modes, and modern tooling at depth. Two pillars that appear in virtually every real-time system design question are caching and event streaming.

Whether you're designing a live leaderboard for a gaming platform, a real-time fraud detection engine, or a ride-sharing dispatch system — you'll need a deep, practical understanding of both. This guide walks you through everything from first principles to production-grade patterns, with code examples and interview-ready talking points throughout.


Who Is This For?

This blog is written for:

  • Beginners entering their first system design interviews
  • Intermediate engineers looking to sharpen their knowledge of distributed systems
  • Senior engineers preparing for staff-level design rounds

No matter where you are in your journey, you'll walk away with actionable knowledge you can use immediately.


The Real-Time Systems Challenge

Before we dive into solutions, let's frame the problem.

A real-time system is one where data must be processed, transmitted, and acted upon with minimal latency — typically under a few hundred milliseconds. Examples include:

  • Live sports scores updated as events happen
  • Stock price tickers reflecting market changes instantly
  • Chat applications delivering messages without perceptible delay
  • Fraud detection flagging suspicious transactions before they clear

The core challenge is this: traditional request-response architectures fall apart under real-time constraints because they are synchronous, stateless per request, and not designed for fan-out (broadcasting one event to many consumers).

This is where caching and event streams come in.


Part 1: Caching in Real-Time Systems

What Is Caching?

Caching is the practice of storing frequently accessed or computationally expensive data in a fast-access layer so that future requests can be served faster — without hitting slower backing stores like databases or external APIs.

Think of it like this: your database is a library with millions of books. Your cache is the small shelf of books you keep on your desk. Fetching from your desk takes milliseconds; going to the library takes seconds.

Cache Placement Strategies

There are several places you can insert a cache in your architecture:

Client → [CDN / Edge Cache] → [API Gateway] → [App Server + In-Memory Cache] → [Distributed Cache (Redis)] → [Database]
LayerTechnologyUse Case
CDN / EdgeCloudflare, FastlyStatic assets, HTML, public API responses
API GatewayKong, AWS API GWRate limiting, response caching
In-processCaffeine (Java), node-cacheUltra-low latency, local hot data
DistributedRedis, DragonflyShared state across service instances
DatabaseQuery cache, materialized viewsReduce expensive query repetition

Cache Eviction Policies

When your cache is full, something has to go. Your eviction policy determines what:

  • LRU (Least Recently Used) — Evict the item that hasn't been accessed in the longest time. Best for temporal locality patterns.
  • LFU (Least Frequently Used) — Evict the least popular items. Better when popularity is stable.
  • TTL (Time-To-Live) — Expire items after a fixed duration. Essential for data correctness in real-time systems.
  • FIFO — Evict in insertion order. Simple but rarely optimal.

For real-time systems, TTL + LRU is the most common combination.

Cache Invalidation: The Hard Problem

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Cache invalidation is what keeps engineers up at night. The three main strategies are:

1. Cache-Aside (Lazy Loading)

The application is responsible for loading data into the cache.

import redis
import json

cache = redis.Redis(host='localhost', port=6379, decode_responses=True)

def get_user_profile(user_id: str) -> dict:
    cache_key = f"user:profile:{user_id}"
    
    # 1. Try cache first
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # 2. Cache miss — fetch from DB
    profile = db.query("SELECT * FROM users WHERE id = %s", user_id)
    
    # 3. Populate cache with TTL
    cache.setex(cache_key, 300, json.dumps(profile))  # 5 min TTL
    
    return profile

Pros: Only caches what's actually needed. Tolerates cache failures gracefully.
Cons: First request always hits the database (cold start). Risk of stale data between TTL windows.

2. Write-Through

Data is written to the cache and the database simultaneously.

def update_user_profile(user_id: str, data: dict) -> None:
    cache_key = f"user:profile:{user_id}"
    
    # Write to DB first
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)
    
    # Immediately update cache
    cache.setex(cache_key, 300, json.dumps(data))

Pros: Cache is always fresh. No stale reads after writes.
Cons: Higher write latency. Caches data that may never be read.

3. Write-Behind (Write-Back)

Writes go to the cache first; a background process flushes to the database asynchronously.

Pros: Very low write latency.
Cons: Risk of data loss if cache crashes before flush. Complex to implement correctly.

Handling Cache Failure Patterns

Real-time systems can suffer catastrophic failures if caching isn't designed defensively.

Cache Stampede (Thundering Herd)

When a hot cache key expires, thousands of requests simultaneously miss the cache and hammer the database.

Solution: Probabilistic early expiration + mutex locks

import threading

_locks = {}

def get_with_lock(cache_key: str, fetch_fn, ttl: int = 300):
    value = cache.get(cache_key)
    if value:
        return json.loads(value)
    
    # Only one thread should regenerate
    lock = _locks.setdefault(cache_key, threading.Lock())
    with lock:
        # Double-check after acquiring lock
        value = cache.get(cache_key)
        if value:
            return json.loads(value)
        
        result = fetch_fn()
        cache.setex(cache_key, ttl, json.dumps(result))
        return result

Cache Penetration

Requests for data that doesn't exist bypass the cache entirely and flood the database.

Solution: Cache null results + Bloom filters

def get_product(product_id: str) -> dict | None:
    cache_key = f"product:{product_id}"
    
    cached = cache.get(cache_key)
    if cached == "NULL":
        return None  # Cached miss — don't hit DB
    if cached:
        return json.loads(cached)
    
    product = db.query("SELECT * FROM products WHERE id = %s", product_id)
    
    if product is None:
        cache.setex(cache_key, 60, "NULL")  # Cache the miss for 1 min
        return None
    
    cache.setex(cache_key, 300, json.dumps(product))
    return product

Cache Avalanche

Many cache keys expire at the same time, causing a wave of DB requests.

Solution: Jitter on TTL values

import random

def set_with_jitter(key: str, value, base_ttl: int = 300):
    jitter = random.randint(0, 60)  # Add 0–60s of randomness
    cache.setex(key, base_ttl + jitter, json.dumps(value))

Part 2: Event Streaming with Kafka

What Is an Event Stream?

An event stream is an ordered, immutable log of events published by producers and consumed by one or more consumers — potentially at different rates and at different times.

Unlike a traditional message queue where messages are deleted after consumption, an event stream retains events for a configurable retention period. This enables:

  • Replay: Re-process historical events
  • Fan-out: Multiple consumers read the same events independently
  • Decoupling: Producers don't need to know about consumers

Apache Kafka is the dominant event streaming platform in 2026, used by Uber, Netflix, LinkedIn, and thousands of others.

Core Kafka Concepts

                    ┌─────────────┐
   Producers ──────▶│   Topic     │──────▶ Consumer Group A
                    │  Partition 0│──────▶ Consumer Group B
                    │  Partition 1│
                    │  Partition 2│
                    └─────────────┘
                      Kafka Broker
ConceptDescription
TopicA named, ordered log of events (like a database table)
PartitionA topic is split into partitions for parallelism
OffsetThe position of a message within a partition
ProducerPublishes events to a topic
ConsumerReads events from a topic
Consumer GroupA set of consumers sharing the workload of a topic
BrokerA Kafka server that stores and serves partitions

Producing Events

from confluent_kafka import Producer
import json

producer = Producer({'bootstrap.servers': 'kafka:9092'})

def publish_order_event(order: dict) -> None:
    event = {
        "event_type": "ORDER_PLACED",
        "order_id": order["id"],
        "user_id": order["user_id"],
        "amount": order["amount"],
        "timestamp": order["created_at"]
    }
    
    producer.produce(
        topic='order-events',
        key=str(order["user_id"]),   # Partition by user for ordering
        value=json.dumps(event).encode('utf-8'),
        callback=delivery_report
    )
    producer.flush()

def delivery_report(err, msg):
    if err:
        print(f'Message delivery failed: {err}')
    else:
        print(f'Delivered to {msg.topic()} [{msg.partition()}] @ offset {msg.offset()}')

Consuming Events

from confluent_kafka import Consumer
import json

consumer = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'fraud-detection-service',
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': False  # Manual commit for at-least-once guarantee
})

consumer.subscribe(['order-events'])

def process_events():
    while True:
        msg = consumer.poll(timeout=1.0)
        if msg is None:
            continue
        if msg.error():
            handle_kafka_error(msg.error())
            continue
        
        event = json.loads(msg.value().decode('utf-8'))
        
        try:
            run_fraud_check(event)
            consumer.commit(msg)  # Only commit after successful processing
        except Exception as e:
            log_and_alert(e, event)
            # Do NOT commit — message will be reprocessed

Kafka Delivery Guarantees

One of the most important topics in any system design interview involving Kafka:

GuaranteeDescriptionUse Case
At-most-onceMessages may be lost, never duplicatedMetrics, logging
At-least-onceMessages may be duplicated, never lostPayments (with idempotency)
Exactly-onceNo loss, no duplicationFinancial transactions

Exactly-once requires Kafka's transactional API and is more complex to implement:

producer = Producer({
    'bootstrap.servers': 'kafka:9092',
    'transactional.id': 'payment-producer-1',
    'enable.idempotence': True
})

producer.init_transactions()

try:
    producer.begin_transaction()
    producer.produce('payment-events', key=payment_id, value=payload)
    producer.commit_transaction()
except Exception as e:
    producer.abort_transaction()
    raise

Part 3: Combining Caching and Event Streams

Here's where it gets interesting. In practice, the most powerful real-time architectures combine both layers. Let's look at a concrete example.

Real-World Example: Live Leaderboard System

Requirements:

  • 10 million active users in a gaming platform
  • Scores update in real-time as players finish matches
  • Leaderboard must reflect changes within 2 seconds
  • Support querying top-100 and a user's rank

Architecture:

Game Server ──▶ Kafka (score-events) ──▶ Score Processor ──▶ Redis Sorted Set
                                                              ▲
                                                    Leaderboard API
                                                              │
                                                          Clients

Step 1: Publish score events

# Game server publishes when a match ends
def on_match_complete(match_result: dict):
    producer.produce(
        topic='score-events',
        key=match_result['player_id'],
        value=json.dumps({
            "player_id": match_result['player_id'],
            "delta_score": match_result['score_gained'],
            "match_id": match_result['match_id'],
            "timestamp": match_result['ended_at']
        })
    )

Step 2: Score processor updates Redis

import redis

r = redis.Redis(host='redis', port=6379)

def handle_score_event(event: dict):
    player_id = event['player_id']
    delta = event['delta_score']
    
    # Redis Sorted Set: ZINCRBY atomically increments a member's score
    new_score = r.zincrby('leaderboard:global', delta, player_id)
    
    # Invalidate any cached rank pages that may now be stale
    r.delete('leaderboard:top100:cached')
    
    print(f"Player {player_id} new score: {new_score}")

Step 3: Leaderboard API queries Redis

from fastapi import FastAPI

app = FastAPI()

@app.get("/leaderboard/top100")
def get_top_100():
    cache_key = 'leaderboard:top100:cached'
    
    # Check cache first (short TTL since scores change frequently)
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # ZREVRANGE returns members sorted highest to lowest
    top_players = r.zrevrange('leaderboard:global', 0, 99, withscores=True)
    
    result = [
        {"rank": i + 1, "player_id": player.decode(), "score": int(score)}
        for i, (player, score) in enumerate(top_players)
    ]
    
    # Cache for 2 seconds — acceptable staleness for leaderboard
    r.setex(cache_key, 2, json.dumps(result))
    return result

@app.get("/leaderboard/rank/{player_id}")
def get_player_rank(player_id: str):
    # ZREVRANK returns 0-indexed rank (highest score = rank 0)
    rank = r.zrevrank('leaderboard:global', player_id)
    score = r.zscore('leaderboard:global', player_id)
    
    if rank is None:
        return {"error": "Player not found"}
    
    return {
        "player_id": player_id,
        "rank": rank + 1,
        "score": int(score)
    }

This architecture handles millions of updates per second because:

  • Kafka decouples the game servers from score processing
  • Redis Sorted Sets provide O(log N) rank lookups
  • The short cache TTL on top-100 reduces Redis load while keeping data fresh

Part 4: Partitioning and Scaling

Kafka Partition Strategy

Partitioning is how Kafka achieves parallelism. The number of partitions limits your maximum consumer parallelism:

Topic: "order-events" with 6 partitions
Consumer Group A: 6 consumers → each reads 1 partition (max throughput)
Consumer Group B: 3 consumers → each reads 2 partitions
Consumer Group C: 8 consumers → 2 consumers are idle (wasteful)

Choosing a partition key matters enormously:

# ✅ Good: partition by user_id for user-level ordering
producer.produce('events', key=user_id, value=payload)

# ❌ Bad: partition by timestamp — hot partitions when traffic spikes
producer.produce('events', key=str(datetime.now()), value=payload)

# ❌ Bad: no key — round-robin loses ordering guarantees
producer.produce('events', value=payload)

Redis Cluster for Horizontal Scaling

A single Redis node maxes out around 100k ops/sec. For higher throughput, use Redis Cluster:

          ┌──────────────────────────────────┐
          │         Redis Cluster            │
          │                                  │
          │  Node 1: Slots 0–5460            │
          │  Node 2: Slots 5461–10922        │
          │  Node 3: Slots 10923–16383       │
          │  (+ replica for each node)       │
          └──────────────────────────────────┘

Redis Cluster uses consistent hashing with 16,384 hash slots. Each key maps to a slot, which maps to a node. This enables:

  • Horizontal scaling by adding nodes
  • Automatic rebalancing
  • High availability with replicas
from redis.cluster import RedisCluster

rc = RedisCluster(
    startup_nodes=[
        {"host": "redis-node-1", "port": 6379},
        {"host": "redis-node-2", "port": 6379},
        {"host": "redis-node-3", "port": 6379},
    ],
    decode_responses=True
)

# Use {hashtag} to force keys to the same slot when needed
rc.set("{user:123}:profile", json.dumps(profile))
rc.set("{user:123}:preferences", json.dumps(prefs))

Part 5: Common Interview Questions & How to Answer Them

"How would you design a real-time notification system for 100 million users?"

Framework to use:

  1. Clarify requirements: Push vs pull? What types of notifications? Guaranteed delivery?
  2. Identify bottlenecks: Fan-out at scale, connection management, delivery acknowledgment
  3. Choose the right tools: Kafka for ingestion, Redis for session lookup, WebSockets or SSE for delivery
  4. Discuss trade-offs: WebSockets require persistent connections (stateful) vs SSE (easier to scale, unidirectional)

Sample answer structure:

Client ←── WebSocket Connection ←── WebSocket Server (scaled horizontally)
                                           │
                                     Redis Pub/Sub
                                     (server-to-server fan-out)
                                           │
                              Notification Processor (Kafka consumer)
                                           │
                                    Kafka Topic
                                    (notification-events)
                                           │
                                     Producers
                              (social graph service, etc.)

"How do you handle backpressure in event-driven systems?"

Backpressure occurs when producers generate events faster than consumers can process them. Strategies:

  • Kafka's consumer lag monitoring: Alert when lag exceeds threshold, scale consumers
  • Rate limiting at the producer: Token bucket or leaky bucket algorithms
  • Circuit breakers: Temporarily stop publishing when downstream is overwhelmed
  • Dead Letter Queues (DLQ): Route unprocessable messages aside without blocking

Best Practices

Caching Best Practices

  • Always set a TTL — never cache indefinitely in a real-time system
  • Use cache-aside for reads, write-through for critical writes
  • Monitor cache hit rate — below 80% usually signals a design problem
  • Never store sensitive data (passwords, PII) in distributed caches without encryption
  • Size your cache for your hot working set, not your entire dataset
  • Use connection pooling — don't open a new Redis connection per request

Kafka Best Practices

  • Choose partition count carefully — it's very difficult to change later
  • Set min.insync.replicas=2 for production topics to prevent data loss
  • Use consumer groups for scaling, not multiple topics
  • Implement idempotent consumers — assume at-least-once delivery
  • Monitor consumer lag with tools like Burrow or Kafka UI
  • Use schema registry (Confluent Schema Registry or AWS Glue) to manage Avro/Protobuf schemas

Common Mistakes to Avoid

Caching Mistakes

Caching mutable, rapidly-changing data with long TTLs
Users see stale prices, scores, or statuses. Use short TTLs or event-driven invalidation.

Not accounting for cache cold start
After a deploy or cache flush, your database gets hammered. Use cache warming scripts.

Storing entire objects when only a field changes
Use Redis hashes (HSET, HGET) to update individual fields atomically.

Ignoring cache memory limits
Caches with no eviction policy will OOM. Always configure maxmemory and maxmemory-policy.

Kafka Mistakes

Using a single partition for ordering
One partition = one consumer = no parallelism. Design your key strategy thoughtfully.

Committing offsets before processing
If the processor crashes after commit but before work completes, you'll lose events.

Not handling consumer rebalances
During scaling, Kafka reassigns partitions. Implement on_revoke handlers to commit cleanly.

Ignoring message size limits
Kafka's default max message size is 1MB. Sending large payloads causes broker errors.


🚀 Pro Tips

  • Use Redis Streams instead of Pub/Sub when you need persistence and consumer groups. Pub/Sub drops messages if no one is listening; Streams don't.

  • Combine Kafka + Redis for the CQRS pattern: Kafka handles the write side (command log), Redis handles the read side (materialized view). This gives you both durability and speed.

  • In interviews, always mention monitoring — Prometheus + Grafana for Redis metrics (redis-exporter), Kafka consumer lag dashboards. Interviewers love candidates who think about observability.

  • Dragonfly DB is emerging as a Redis-compatible alternative with significantly better multi-threaded performance. Worth mentioning as a modern alternative in senior-level interviews.

  • For exactly-once semantics across Kafka + database, use the Transactional Outbox Pattern: write your event to a database table in the same transaction as your business data, then a separate poller publishes it to Kafka. This eliminates the dual-write problem.

  • Compression matters at scale: Enable Kafka producer compression (compression.type=snappy) to reduce network I/O and broker storage costs by 3–5x for JSON payloads.

  • Benchmark your cache serialization: JSON is readable but slow. Consider MessagePack or Protocol Buffers for high-throughput caches — they can be 2–4x faster to serialize/deserialize.


📌 Key Takeaways

  • Caching and event streaming are complementary, not competing — use caching for low-latency reads, event streams for reliable, decoupled writes.

  • Cache invalidation is the hardest part — choose the right strategy (cache-aside, write-through, write-behind) based on your consistency requirements, and always add jitter to TTLs.

  • Kafka's delivery guarantees matter — understand at-most-once, at-least-once, and exactly-once, and design your consumers to be idempotent regardless.

  • Redis Sorted Sets are purpose-built for real-time leaderboardsZINCRBY, ZREVRANK, and ZREVRANGE are your best friends.

  • Partition strategy in Kafka determines your scalability ceiling — choose keys that distribute load evenly and preserve the ordering guarantees your business logic needs.

  • Design for failure: Cache stampedes, avalanches, and penetration are predictable — handle them proactively with locks, null caching, and TTL jitter.

  • Always discuss observability in interviews — consumer lag, cache hit rates, p99 latency, and error rates are what keep these systems healthy in production.


Conclusion

Designing scalable real-time systems is one of the most complex and rewarding challenges in backend engineering. In 2026, the bar for system design interviews has risen — but so have the tools available to us.

By mastering caching patterns (cache-aside, write-through, TTL jitter, stampede prevention) and event streaming fundamentals (Kafka partitioning, consumer groups, delivery guarantees), you arm yourself with the building blocks to tackle any real-time design problem thrown at you.

The key mindset shift is to think in trade-offs: consistency vs availability, throughput vs latency, simplicity vs fault-tolerance. Interviewers want to see that thinking out loud, backed by concrete knowledge of how these systems behave under load.

Go build something. Set up a local Kafka cluster with Docker Compose. Write a leaderboard with Redis Sorted Sets. Break it. Fix it. That hands-on intuition is what separates candidates who've memorized architectures from engineers who can design them.

Good luck — and happy designing. 🎯

All Articles
System DesignReal-Time SystemsCachingEvent StreamingKafkaRedisScalabilityDistributed SystemsInterview PrepBackend Engineering

Written by

Niraj Kumar

Software Developer — building scalable systems for businesses.