System Design Concepts

Last updated on
7 min read

Table of Contents

Rate Limiting

  • A control mechanism used to manage and restrict the number of requests a client can make to the server or a service in a specific time frame.
  • It helps in preventing DoS attacks, brute-force attacks, and abuse.
  • Example use case: a public API endpoint limiting 1000 request per hour for an IP.

Types of Rate Limiting Algorithms

Token Bucket
  • Imagine a bucket that holds tokens. Tokens are added to the bucket at a fixed rate (e.g. 10 tokens per second). Each incoming request consumes a token. If there are enough tokens then request is allowed. If not, the request is discarded or queued (based on the implementation).
  • Key parameters
    • Token rate - rate at which tokens are added per second
    • Bucket Capacity - Maximum number of tokens in a bucket
    • Cost per request - Number of tokens each request consumes
  • Pros - Simple to implement, supports bursts
Leaky Bucket
  • Imagine a bucket with a small hole at the bottom. Incoming requests are added to the bucket. The bucket leaks at a fixed rate (i.e., requests are processed at a fixed rate). If the bucket overflows (i.e., too many incoming requests), excess requests are dropped.
  • Key parameters
    • Bucket size - maximum number of requests that can be held
    • Leak rate - rate at which requests are processed
  • Pros - Output rate is smooth compared to token bucket algorithm which bursts. Helps prevent system overload with strict rate
  • Cons - Less flexible for applications requiring temporary high throughput
Fixed Window Counter
  • A simple and commonly used rate limiting algorithm that tracks how many requests have occurred in a fixed time interval (e.g. 1 minute). If the count exceeds the configured limit within that window, further requests are rejected until the next window starts.
  • Each window starts with 0. For every request, the counter is incremented and the request is processed. If the counter exceeds the limit, requests are denied until the start of the next window.
  • Key Parameters
    • Window size - duration of each window
    • Limit per window - maximum number of requests allowed per window.
  • Pros - Simple to implement. Low overhead (only one counter per user per window).
  • Cons - Can result in traffic spikes right after the window resets.
Sliding Window Log
  • This algorithm keeps a log of individual request timestamps and ensures that the number of requests within a sliding time window (e.g. 1 minute) does not exceed the time limit.
  • For every request that comes in, the timestamp is logged. For every request, remove all logs older than the sliding time window. Check how many timestamps remain. If the count is below the configured limit, allow the request and add the timestamp to the log. If count is above the limit, then reject the request.
  • Key Parameters
    • Sliding Window Time - time duration of the window.
    • Limit per window - maximum number of requests allowed per window.
  • Pros - Very accurate.
  • Cons - Memory intensive, since timestamps needs to be stored for every client. Scalability is challenge especially in a distributed environment.
Sliding Window Counter
  • This is a hybrid approach combining fixed window counter and sliding window log.
  • Track the request count for the current and previous window. To count the total request count in the sliding window period, calculate the weighted sum of previous window and the current window count. If the count exceeds the limit, reject the request, else process and increment the counter.
  • total = current_count + previous_count × (1 - fraction_of_current_window_elapsed). For example, if the window is 60 seconds and 45 seconds have elapsed in the current window, we take 25% of the previous window’s count and add it to the current count.
  • Pros - more accurate than fixed window and more memory efficient than sliding window log.
  • Cons - slight approximation error.

Caching

  • Caching is a technique of storing frequently accessed data in a faster storage medium so that future requests for that data can be served quickly.
  • It improves performance and reduces time taken to retrieve data.
  • Each cache lookup results in either a hit or a miss, depending on whether the requested data is present.
  • Cache hit ratio is used to measure the effectiveness of the cache. Cache Hit Ratio = Number of Cache Hits / (Cache Hits + Cache Misses).

Benefits of Caching

  • Improved performance
  • Reduces load on backend systems
  • Improved user experience
  • Cost efficiency

Tradeoffs of Caching

  • Stale data - cached data can become outdated if not refreshed properly. Requires careful cache eviction policy design.
  • Caching introduces complexity in ensuring data consistency, cache invalidation, and handling edge cases like race conditions.

Types of Cache

  • CPU Cache
  • Memory Cache
  • Disk Cache
  • Application Cache
  • Browser Cache
  • Database Cache
  • CDN Cache
  • DNS Cache

Caching Strategies

Cache-Aside
  • The application checks the cache. On a cache miss, it fetches data from the source, stores it in the cache and returns it.
  • Easy to implement.
  • First access is slow.
Write-Through
  • Every write to the backend also updates the cache immediately. Reads are served from cache.
  • Cache remains in sync with the backend. Read performance is fast.
  • Writes to the backend are slower due to extra cache operation.
Write-Behind (Write Back)
  • Writes are made to cache first, then backend is updated asynchronously.
  • Writes are faster.
  • Risk of data loss on crash before updating to the backend.
Read-Through
  • The application interacts only with the cache. On a cache miss, the cache fetches data from the backend and stores it before returning it.
  • Centralized logic for loading and caching.
  • Requires a more intelligent cache implementation to handle data fetching and population.
Refresh-Ahead
  • Cache proactively refreshes entries before they expire, based on access patterns.
  • Prevents cache misses for frequently accessed data.
  • Hard to tune refresh timing.

Caching Eviction Policies

LRU - Least Recently Used
  • Removes the least recently accessed item.
  • Tracks order of access.
  • Good for scenarios where recently accessed data is likely to be reused.
  • Adapts well to changing access patterns.
LFU - Least Frequently Used
  • Removes the least frequently accessed item.
  • Tracks frequency of access.
  • Can be more memory-intensive, as access counters must be maintained per item.
FIFO - First In First Out
  • Removes the oldest item first.
  • Simple to build. Low overhead in terms of time and space complexity.
MRU - Most Recently Used
  • Removes most recently used item.
  • Opposite of LRU.
  • Useful in some niche cases where frequent reuse is unlikely.
RR - Random Replacement
  • Removes a random item from the cache.
  • Simple and fast but unpredictable.

Load Balancing

  • Load Balancers distribute incoming traffic requests across multiple servers to improve performance, reliability, and scalability.
  • Load balancers can be implemented using hardware appliances or software solutions.
  • Key characteristics are:
    • Efficient traffic distribution - Prevents any single server being overburdened with requests by distributing requests evenly.
    • Health Monitoring - Continuously checks the health of backend servers; traffic is redirected away from unhealthy or unreachable servers.
    • Scalability - Makes it easy to scale horizontally by adding more servers, supporting application growth.
  • Pros
    • Simplified horizontal scaling.
    • Improved system availability and fault tolerance.
  • Cons
    • Can introduce a single Point of failure.
    • Increased complexity in deployment and maintenance

Types of Load Balancers

Layer 4 (Transport Layer) Load Balancer
  • Operates at the transport layer using IP addresses, TCP/UDP ports, and protocol type.
  • Faster and efficient.
  • Examples: iptables, AWS Network Load Balancer, HAPRoxy (in TCP mode)
Layer 7 (Application Layer) Load Balancer
  • Operates at the application layer of the OSI model.
  • Makes decisions based on HTTP headers, URLs, cookies, and other content-level data.
  • Make intelligent routing decisions based on the content of the request.
  • Examples: nginx, HAProxy(in HTTP Mode), AWS Application load balancer.

Load Balancing Algorithms

Round Robin
  • Requests are distributed sequentially.
  • Simple and easy to configure if all servers are of same capacity.
Weighted Round Robin
  • Weights are assigned to servers based on their capacity.
  • Higher capacity servers receive more requests.
Source IP hash
  • Hashes the client’s source IP to determine the server that handles the request.
  • Ensures session stickiness, since the requests from same client IP will be sent to the same server.
Least Connections
  • Routes traffic to the server with the fewest active connections.
  • Effective for long-lived or variable-duration connections (e.g., WebSocket, streaming).
Least Response Time Method
  • Selects the server with the lowest latency and fewest active connections.
  • Requires continuous monitoring of backend response times.
Resource Based
  • Monitors server CPU, memory, or other metrics in real time.
  • Routes traffic based on current resource availability.
  • Often integrated with external monitoring systems (e.g., Prometheus, Datadog).

To be added

  • CAP Theorem
  • Consistency patterns
  • Sharding
  • Consistent Hashing

Common Questions

  • Difference between Load Balancer and Reverse Proxy

References