System Design Concepts

Last updated on Jul 6, 2025

7 min read

Rate Limiting
Types of Rate Limiting Algorithms
Token Bucket
Leaky Bucket
Fixed Window Counter
Sliding Window Log
Sliding Window Counter
Caching
Benefits of Caching
Tradeoffs of Caching
Types of Cache
Caching Strategies
Cache-Aside
Write-Through
Write-Behind (Write Back)
Read-Through
Refresh-Ahead
Caching Eviction Policies
LRU - Least Recently Used
LFU - Least Frequently Used
FIFO - First In First Out
MRU - Most Recently Used
RR - Random Replacement
Load Balancing
Types of Load Balancers
Layer 4 (Transport Layer) Load Balancer
Layer 7 (Application Layer) Load Balancer
Load Balancing Algorithms
Round Robin
Weighted Round Robin
Source IP hash
Least Connections
Least Response Time Method
Resource Based
To be added
Common Questions
References

Rate Limiting

A control mechanism used to manage and restrict the number of requests a client can make to the server or a service in a specific time frame.
It helps in preventing DoS attacks, brute-force attacks, and abuse.
Example use case: a public API endpoint limiting 1000 request per hour for an IP.

Types of Rate Limiting Algorithms

Token Bucket

Imagine a bucket that holds tokens. Tokens are added to the bucket at a fixed rate (e.g. 10 tokens per second). Each incoming request consumes a token. If there are enough tokens then request is allowed. If not, the request is discarded or queued (based on the implementation).
Key parameters
- Token rate - rate at which tokens are added per second
- Bucket Capacity - Maximum number of tokens in a bucket
- Cost per request - Number of tokens each request consumes
Pros - Simple to implement, supports bursts

Leaky Bucket

Imagine a bucket with a small hole at the bottom. Incoming requests are added to the bucket. The bucket leaks at a fixed rate (i.e., requests are processed at a fixed rate). If the bucket overflows (i.e., too many incoming requests), excess requests are dropped.
Key parameters
- Bucket size - maximum number of requests that can be held
- Leak rate - rate at which requests are processed
Pros - Output rate is smooth compared to token bucket algorithm which bursts. Helps prevent system overload with strict rate
Cons - Less flexible for applications requiring temporary high throughput

Fixed Window Counter

A simple and commonly used rate limiting algorithm that tracks how many requests have occurred in a fixed time interval (e.g. 1 minute). If the count exceeds the configured limit within that window, further requests are rejected until the next window starts.
Each window starts with 0. For every request, the counter is incremented and the request is processed. If the counter exceeds the limit, requests are denied until the start of the next window.
Key Parameters
- Window size - duration of each window
- Limit per window - maximum number of requests allowed per window.
Pros - Simple to implement. Low overhead (only one counter per user per window).
Cons - Can result in traffic spikes right after the window resets.

Sliding Window Log

This algorithm keeps a log of individual request timestamps and ensures that the number of requests within a sliding time window (e.g. 1 minute) does not exceed the time limit.
For every request that comes in, the timestamp is logged. For every request, remove all logs older than the sliding time window. Check how many timestamps remain. If the count is below the configured limit, allow the request and add the timestamp to the log. If count is above the limit, then reject the request.
Key Parameters
- Sliding Window Time - time duration of the window.
- Limit per window - maximum number of requests allowed per window.
Pros - Very accurate.
Cons - Memory intensive, since timestamps needs to be stored for every client. Scalability is challenge especially in a distributed environment.

Sliding Window Counter

This is a hybrid approach combining fixed window counter and sliding window log.
Track the request count for the current and previous window. To count the total request count in the sliding window period, calculate the weighted sum of previous window and the current window count. If the count exceeds the limit, reject the request, else process and increment the counter.
total = current_count + previous_count × (1 - fraction_of_current_window_elapsed). For example, if the window is 60 seconds and 45 seconds have elapsed in the current window, we take 25% of the previous window’s count and add it to the current count.
Pros - more accurate than fixed window and more memory efficient than sliding window log.
Cons - slight approximation error.

Caching

Caching is a technique of storing frequently accessed data in a faster storage medium so that future requests for that data can be served quickly.
It improves performance and reduces time taken to retrieve data.
Each cache lookup results in either a hit or a miss, depending on whether the requested data is present.
Cache hit ratio is used to measure the effectiveness of the cache. Cache Hit Ratio = Number of Cache Hits / (Cache Hits + Cache Misses).

Benefits of Caching

Improved performance
Reduces load on backend systems
Improved user experience
Cost efficiency

Tradeoffs of Caching

Stale data - cached data can become outdated if not refreshed properly. Requires careful cache eviction policy design.
Caching introduces complexity in ensuring data consistency, cache invalidation, and handling edge cases like race conditions.

Types of Cache

CPU Cache
Memory Cache
Disk Cache
Application Cache
Browser Cache
Database Cache
CDN Cache
DNS Cache

Caching Strategies

Cache-Aside

The application checks the cache. On a cache miss, it fetches data from the source, stores it in the cache and returns it.
Easy to implement.
First access is slow.

Write-Through

Every write to the backend also updates the cache immediately. Reads are served from cache.
Cache remains in sync with the backend. Read performance is fast.
Writes to the backend are slower due to extra cache operation.

Write-Behind (Write Back)

Writes are made to cache first, then backend is updated asynchronously.
Writes are faster.
Risk of data loss on crash before updating to the backend.

Read-Through

The application interacts only with the cache. On a cache miss, the cache fetches data from the backend and stores it before returning it.
Centralized logic for loading and caching.
Requires a more intelligent cache implementation to handle data fetching and population.

Refresh-Ahead

Cache proactively refreshes entries before they expire, based on access patterns.
Prevents cache misses for frequently accessed data.
Hard to tune refresh timing.

Caching Eviction Policies

LRU - Least Recently Used

Removes the least recently accessed item.
Tracks order of access.
Good for scenarios where recently accessed data is likely to be reused.
Adapts well to changing access patterns.

LFU - Least Frequently Used

Removes the least frequently accessed item.
Tracks frequency of access.
Can be more memory-intensive, as access counters must be maintained per item.

FIFO - First In First Out

Removes the oldest item first.
Simple to build. Low overhead in terms of time and space complexity.

MRU - Most Recently Used

Removes most recently used item.
Opposite of LRU.
Useful in some niche cases where frequent reuse is unlikely.

RR - Random Replacement

Removes a random item from the cache.
Simple and fast but unpredictable.

Load Balancing

Load Balancers distribute incoming traffic requests across multiple servers to improve performance, reliability, and scalability.
Load balancers can be implemented using hardware appliances or software solutions.
Key characteristics are:
- Efficient traffic distribution - Prevents any single server being overburdened with requests by distributing requests evenly.
- Health Monitoring - Continuously checks the health of backend servers; traffic is redirected away from unhealthy or unreachable servers.
- Scalability - Makes it easy to scale horizontally by adding more servers, supporting application growth.
Pros
- Simplified horizontal scaling.
- Improved system availability and fault tolerance.
Cons
- Can introduce a single Point of failure.
- Increased complexity in deployment and maintenance

Types of Load Balancers

Layer 4 (Transport Layer) Load Balancer

Operates at the transport layer using IP addresses, TCP/UDP ports, and protocol type.
Faster and efficient.
Examples: iptables, AWS Network Load Balancer, HAPRoxy (in TCP mode)

Layer 7 (Application Layer) Load Balancer

Operates at the application layer of the OSI model.
Makes decisions based on HTTP headers, URLs, cookies, and other content-level data.
Make intelligent routing decisions based on the content of the request.
Examples: nginx, HAProxy(in HTTP Mode), AWS Application load balancer.

Load Balancing Algorithms

Round Robin

Requests are distributed sequentially.
Simple and easy to configure if all servers are of same capacity.

Weighted Round Robin

Weights are assigned to servers based on their capacity.
Higher capacity servers receive more requests.

Source IP hash

Hashes the client’s source IP to determine the server that handles the request.
Ensures session stickiness, since the requests from same client IP will be sent to the same server.

Least Connections

Routes traffic to the server with the fewest active connections.
Effective for long-lived or variable-duration connections (e.g., WebSocket, streaming).

Least Response Time Method

Selects the server with the lowest latency and fewest active connections.
Requires continuous monitoring of backend response times.

Resource Based

Monitors server CPU, memory, or other metrics in real time.
Routes traffic based on current resource availability.
Often integrated with external monitoring systems (e.g., Prometheus, Datadog).

To be added

CAP Theorem
Consistency patterns
Sharding
Consistent Hashing

Common Questions

Difference between Load Balancer and Reverse Proxy

System Design Concepts

Table of Contents

Rate Limiting

Types of Rate Limiting Algorithms

Token Bucket

Leaky Bucket

Fixed Window Counter

Sliding Window Log

Sliding Window Counter

Caching

Benefits of Caching

Tradeoffs of Caching

Types of Cache

Caching Strategies

Cache-Aside

Write-Through

Write-Behind (Write Back)

Read-Through

Refresh-Ahead

Caching Eviction Policies

LRU - Least Recently Used

LFU - Least Frequently Used

FIFO - First In First Out

MRU - Most Recently Used

RR - Random Replacement

Load Balancing

Types of Load Balancers

Layer 4 (Transport Layer) Load Balancer

Layer 7 (Application Layer) Load Balancer

Load Balancing Algorithms

Round Robin

Weighted Round Robin

Source IP hash

Least Connections

Least Response Time Method

Resource Based

To be added

Common Questions

References