[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1057":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":10,"languages":10,"totalLinesOfCode":10,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":35,"readmeContent":36,"aiSummary":37,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":38,"discoverSource":39},1057,"system-design","karanpratapsingh\u002Fsystem-design","karanpratapsingh","Learn how to design systems at scale and prepare for system design interviews","https:\u002F\u002Fleanpub.com\u002Fsystemdesign",null,43968,5713,352,2,0,22,124,764,119,120,"Other",false,"main",true,[26,27,28,29,30,31,32,5,33,34],"architecture","distributed-systems","engineering","interview","interview-preparation","microservices","scalability","system-design-interview","tech","2026-06-12 04:00:07","# System Design\n\nHey, welcome to the course. I hope this course provides a great learning experience.\n\n_This course is also available on my [website](https:\u002F\u002Fkaranpratapsingh.com\u002Fcourses\u002Fsystem-design) and as an ebook on [leanpub](https:\u002F\u002Fleanpub.com\u002Fsystemdesign). Please leave a ⭐ as motivation if this was helpful!_\n\n# Table of contents\n\n- **Getting Started**\n\n  - [What is system design?](#what-is-system-design)\n\n- **Chapter I**\n\n  - [IP](#ip)\n  - [OSI Model](#osi-model)\n  - [TCP and UDP](#tcp-and-udp)\n  - [Domain Name System (DNS)](#domain-name-system-dns)\n  - [Load Balancing](#load-balancing)\n  - [Clustering](#clustering)\n  - [Caching](#caching)\n  - [Content Delivery Network (CDN)](#content-delivery-network-cdn)\n  - [Proxy](#proxy)\n  - [Availability](#availability)\n  - [Scalability](#scalability)\n  - [Storage](#storage)\n\n- **Chapter II**\n\n  - [Databases and DBMS](#databases-and-dbms)\n  - [SQL databases](#sql-databases)\n  - [NoSQL databases](#nosql-databases)\n  - [SQL vs NoSQL databases](#sql-vs-nosql-databases)\n  - [Database Replication](#database-replication)\n  - [Indexes](#indexes)\n  - [Normalization and Denormalization](#normalization-and-denormalization)\n  - [ACID and BASE consistency models](#acid-and-base-consistency-models)\n  - [CAP theorem](#cap-theorem)\n  - [PACELC Theorem](#pacelc-theorem)\n  - [Transactions](#transactions)\n  - [Distributed Transactions](#distributed-transactions)\n  - [Sharding](#sharding)\n  - [Consistent Hashing](#consistent-hashing)\n  - [Database Federation](#database-federation)\n\n- **Chapter III**\n\n  - [N-tier architecture](#n-tier-architecture)\n  - [Message Brokers](#message-brokers)\n  - [Message Queues](#message-queues)\n  - [Publish-Subscribe](#publish-subscribe)\n  - [Enterprise Service Bus (ESB)](#enterprise-service-bus-esb)\n  - [Monoliths and Microservices](#monoliths-and-microservices)\n  - [Event-Driven Architecture (EDA)](#event-driven-architecture-eda)\n  - [Event Sourcing](#event-sourcing)\n  - [Command and Query Responsibility Segregation (CQRS)](#command-and-query-responsibility-segregation-cqrs)\n  - [API Gateway](#api-gateway)\n  - [REST, GraphQL, gRPC](#rest-graphql-grpc)\n  - [Long polling, WebSockets, Server-Sent Events (SSE)](#long-polling-websockets-server-sent-events-sse)\n\n- **Chapter IV**\n\n  - [Geohashing and Quadtrees](#geohashing-and-quadtrees)\n  - [Circuit breaker](#circuit-breaker)\n  - [Rate Limiting](#rate-limiting)\n  - [Service Discovery](#service-discovery)\n  - [SLA, SLO, SLI](#sla-slo-sli)\n  - [Disaster recovery](#disaster-recovery)\n  - [Virtual Machines (VMs) and Containers](#virtual-machines-vms-and-containers)\n  - [OAuth 2.0 and OpenID Connect (OIDC)](#oauth-20-and-openid-connect-oidc)\n  - [Single Sign-On (SSO)](#single-sign-on-sso)\n  - [SSL, TLS, mTLS](#ssl-tls-mtls)\n\n- **Chapter V**\n\n  - [System Design Interviews](#system-design-interviews)\n  - [URL Shortener](#url-shortener)\n  - [WhatsApp](#whatsapp)\n  - [Twitter](#twitter)\n  - [Netflix](#netflix)\n  - [Uber](#uber)\n\n- **Appendix**\n\n  - [Next Steps](#next-steps)\n  - [References](#references)\n\n# What is system design?\n\nBefore we start this course, let's talk about what even is system design.\n\nSystem design is the process of defining the architecture, interfaces, and data\nfor a system that satisfies specific requirements. System design meets the needs\nof your business or organization through coherent and efficient systems. It requires\na systematic approach to building and engineering systems. A good system design requires\nus to think about everything, from infrastructure all the way down to the data and how it's stored.\n\n## Why is System Design so important?\n\nSystem design helps us define a solution that meets the business requirements. It is\none of the earliest decisions we can make when building a system. Often it is essential\nto think from a high level as these decisions are very difficult to correct later. It\nalso makes it easier to reason about and manage architectural changes as the system evolves.\n\n# IP\n\nAn IP address is a unique address that identifies a device on the internet or a local network. IP stands for _\"Internet Protocol\"_, which is the set of rules governing the format of data sent via the internet or local network.\n\nIn essence, IP addresses are the identifier that allows information to be sent between devices on a network. They contain location information and make devices accessible for communication. The internet needs a way to differentiate between different computers, routers, and websites. IP addresses provide a way of doing so and form an essential part of how the internet works.\n\n## Versions\n\nNow, let's learn about the different versions of IP addresses:\n\n### IPv4\n\nThe original Internet Protocol is IPv4 which uses a 32-bit numeric dot-decimal notation that only allows for around 4 billion IP addresses. Initially, it was more than enough but as internet adoption grew, we needed something better.\n\n_Example: `102.22.192.181`_\n\n### IPv6\n\nIPv6 is a new protocol that was introduced in 1998. Deployment commenced in the mid-2000s and since the internet users have grown exponentially, it is still ongoing.\n\nThis new protocol uses 128-bit alphanumeric hexadecimal notation. This means that IPv6 can provide about ~340e+36 IP addresses. That's more than enough to meet the growing demand for years to come.\n\n_Example: `2001:0db8:85a3:0000:0000:8a2e:0370:7334`_\n\n## Types\n\nLet's discuss types of IP addresses:\n\n### Public\n\nA public IP address is an address where one primary address is associated with your whole network. In this type of IP address, each of the connected devices has the same IP address.\n\n_Example: IP address provided to your router by the ISP._\n\n### Private\n\nA private IP address is a unique IP number assigned to every device that connects to your internet network, which includes devices like computers, tablets, and smartphones, which are used in your household.\n\n_Example: IP addresses generated by your home router for your devices._\n\n### Static\n\nA static IP address does not change and is one that was manually created, as opposed to having been assigned. These addresses are usually more expensive but are more reliable.\n\n_Example: They are usually used for important things like reliable geo-location services, remote access, server hosting, etc._\n\n### Dynamic\n\nA dynamic IP address changes from time to time and is not always the same. It has been assigned by a [Dynamic Host Configuration Protocol (DHCP)](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FDynamic_Host_Configuration_Protocol) server. Dynamic IP addresses are the most common type of internet protocol address. They are cheaper to deploy and allow us to reuse IP addresses within a network as needed.\n\n_Example: They are more commonly used for consumer equipment and personal use._\n\n# OSI Model\n\nThe OSI Model is a logical and conceptual model that defines network communication used by systems open to interconnection and communication with other systems. The Open System Interconnection (OSI Model) also defines a logical network and effectively describes computer packet transfer by using various layers of protocols.\n\nThe OSI Model can be seen as a universal language for computer networking. It's based on the concept of splitting up a communication system into seven abstract layers, each one stacked upon the last.\n\n## Why does the OSI model matter?\n\nThe Open System Interconnection (OSI) model has defined the common terminology used in networking discussions and documentation. This allows us to take a very complex communications process apart and evaluate its components.\n\nWhile this model is not directly implemented in the TCP\u002FIP networks that are most common today, it can still help us do so much more, such as:\n\n- Make troubleshooting easier and help identify threats across the entire stack.\n- Encourage hardware manufacturers to create networking products that can communicate with each other over the network.\n- Essential for developing a security-first mindset.\n- Separate a complex function into simpler components.\n\n## Layers\n\nThe seven abstraction layers of the OSI model can be defined as follows, from top to bottom:\n\n![osi-model](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fosi-model\u002Fosi-model.png)\n\n### Application\n\nThis is the only layer that directly interacts with data from the user. Software applications like web browsers and email clients rely on the application layer to initiate communication. But it should be made clear that client software applications are not part of the application layer, rather the application layer is responsible for the protocols and data manipulation that the software relies on to present meaningful data to the user. Application layer protocols include HTTP as well as SMTP.\n\n### Presentation\n\nThe presentation layer is also called the Translation layer. The data from the application layer is extracted here and manipulated as per the required format to transmit over the network. The functions of the presentation layer are translation, encryption\u002Fdecryption, and compression.\n\n### Session\n\nThis is the layer responsible for opening and closing communication between the two devices. The time between when the communication is opened and closed is known as the session. The session layer ensures that the session stays open long enough to transfer all the data being exchanged, and then promptly closes the session in order to avoid wasting resources. The session layer also synchronizes data transfer with checkpoints.\n\n### Transport\n\nThe transport layer (also known as layer 4) is responsible for end-to-end communication between the two devices. This includes taking data from the session layer and breaking it up into chunks called segments before sending it to the Network layer (layer 3). It is also responsible for reassembling the segments on the receiving device into data the session layer can consume.\n\n### Network\n\nThe network layer is responsible for facilitating data transfer between two different networks. The network layer breaks up segments from the transport layer into smaller units, called packets, on the sender's device, and reassembles these packets on the receiving device. The network layer also finds the best physical path for the data to reach its destination this is known as routing. If the two devices communicating are on the same network, then the network layer is unnecessary.\n\n### Data Link\n\nThe data link layer is very similar to the network layer, except the data link layer facilitates data transfer between two devices on the same network. The data link layer takes packets from the network layer and breaks them into smaller pieces called frames.\n\n### Physical\n\nThis layer includes the physical equipment involved in the data transfer, such as the cables and switches. This is also the layer where the data gets converted into a bit stream, which is a string of 1s and 0s. The physical layer of both devices must also agree on a signal convention so that the 1s can be distinguished from the 0s on both devices.\n\n# TCP and UDP\n\n## TCP\n\nTransmission Control Protocol (TCP) is connection-oriented, meaning once a connection has been established, data can be transmitted in both directions. TCP has built-in systems to check for errors and to guarantee data will be delivered in the order it was sent, making it the perfect protocol for transferring information like still images, data files, and web pages.\n\n![tcp](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Ftcp-and-udp\u002Ftcp.png)\n\nBut while TCP is instinctively reliable, its feedback mechanisms also result in a larger overhead, translating to greater use of the available bandwidth on the network.\n\n## UDP\n\nUser Datagram Protocol (UDP) is a simpler, connectionless internet protocol in which error-checking and recovery services are not required. With UDP, there is no overhead for opening a connection, maintaining a connection, or terminating a connection. Data is continuously sent to the recipient, whether or not they receive it.\n\n![udp](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Ftcp-and-udp\u002Fudp.png)\n\nIt is largely preferred for real-time communications like broadcast or multicast network transmission. We should use UDP over TCP when we need the lowest latency and late data is worse than the loss of data.\n\n## TCP vs UDP\n\nTCP is a connection-oriented protocol, whereas UDP is a connectionless protocol. A key difference between TCP and UDP is speed, as TCP is comparatively slower than UDP. Overall, UDP is a much faster, simpler, and more efficient protocol, however, retransmission of lost data packets is only possible with TCP.\n\nTCP provides ordered delivery of data from user to server (and vice versa), whereas UDP is not dedicated to end-to-end communications, nor does it check the readiness of the receiver.\n\n| Feature             | TCP                                         | UDP                                |\n| ------------------- | ------------------------------------------- | ---------------------------------- |\n| Connection          | Requires an established connection          | Connectionless protocol            |\n| Guaranteed delivery | Can guarantee delivery of data              | Cannot guarantee delivery of data  |\n| Re-transmission     | Re-transmission of lost packets is possible | No re-transmission of lost packets |\n| Speed               | Slower than UDP                             | Faster than TCP                    |\n| Broadcasting        | Does not support broadcasting               | Supports broadcasting              |\n| Use cases           | HTTPS, HTTP, SMTP, POP, FTP, etc            | Video streaming, DNS, VoIP, etc    |\n\n# Domain Name System (DNS)\n\nEarlier we learned about IP addresses that enable every machine to connect with other machines. But as we know humans are more comfortable with names than numbers. It's easier to remember a name like `google.com` than something like `122.250.192.232`.\n\nThis brings us to Domain Name System (DNS) which is a hierarchical and decentralized naming system used for translating human-readable domain names to IP addresses.\n\n## How DNS works\n\n![how-dns-works](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fdomain-name-system\u002Fhow-dns-works.png)\n\nDNS lookup involves the following eight steps:\n\n1. A client types [example.com](http:\u002F\u002Fexample.com) into a web browser, the query travels to the internet and is received by a DNS resolver.\n2. The resolver then recursively queries a DNS root nameserver.\n3. The root server responds to the resolver with the address of a Top-Level Domain (TLD).\n4. The resolver then makes a request to the `.com` TLD.\n5. The TLD server then responds with the IP address of the domain's nameserver, [example.com](http:\u002F\u002Fexample.com).\n6. Lastly, the recursive resolver sends a query to the domain's nameserver.\n7. The IP address for [example.com](http:\u002F\u002Fexample.com) is then returned to the resolver from the nameserver.\n8. The DNS resolver then responds to the web browser with the IP address of the domain requested initially.\n\nOnce the IP address has been resolved, the client should be able to request content from the resolved IP address. For example, the resolved IP may return a webpage to be rendered in the browser.\n\n## Server types\n\nNow, let's look at the four key groups of servers that make up the DNS infrastructure.\n\n### DNS Resolver\n\nA DNS resolver (also known as a DNS recursive resolver) is the first stop in a DNS query. The recursive resolver acts as a middleman between a client and a DNS nameserver. After receiving a DNS query from a web client, a recursive resolver will either respond with cached data, or send a request to a root nameserver, followed by another request to a TLD nameserver, and then one last request to an authoritative nameserver. After receiving a response from the authoritative nameserver containing the requested IP address, the recursive resolver then sends a response to the client.\n\n### DNS root server\n\nA root server accepts a recursive resolver's query which includes a domain name, and the root nameserver responds by directing the recursive resolver to a TLD nameserver, based on the extension of that domain (`.com`, `.net`, `.org`, etc.). The root nameservers are overseen by a nonprofit called the [Internet Corporation for Assigned Names and Numbers (ICANN)](https:\u002F\u002Fwww.icann.org).\n\nThere are 13 DNS root nameservers known to every recursive resolver. Note that while there are 13 root nameservers, that doesn't mean that there are only 13 machines in the root nameserver system. There are 13 types of root nameservers, but there are multiple copies of each one all over the world, which use [Anycast routing](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FAnycast) to provide speedy responses.\n\n### TLD nameserver\n\nA TLD nameserver maintains information for all the domain names that share a common domain extension, such as `.com`, `.net`, or whatever comes after the last dot in a URL.\n\nManagement of TLD nameservers is handled by the [Internet Assigned Numbers Authority (IANA)](https:\u002F\u002Fwww.iana.org), which is a branch of [ICANN](https:\u002F\u002Fwww.icann.org). The IANA breaks up the TLD servers into two main groups:\n\n- **Generic top-level domains**: These are domains like `.com`, `.org`, `.net`, `.edu`, and `.gov`.\n- **Country code top-level domains**: These include any domains that are specific to a country or state. Examples include `.uk`, `.us`, `.ru`, and `.jp`.\n\n### Authoritative DNS server\n\nThe authoritative nameserver is usually the resolver's last step in the journey for an IP address. The authoritative nameserver contains information specific to the domain name it serves (e.g. [google.com](http:\u002F\u002Fgoogle.com)) and it can provide a recursive resolver with the IP address of that server found in the DNS A record, or if the domain has a CNAME record (alias) it will provide the recursive resolver with an alias domain, at which point the recursive resolver will have to perform a whole new DNS lookup to procure a record from an authoritative nameserver (often an A record containing an IP address). If it cannot find the domain, returns the NXDOMAIN message.\n\n## Query Types\n\nThere are three types of queries in a DNS system:\n\n### Recursive\n\nIn a recursive query, a DNS client requires that a DNS server (typically a DNS recursive resolver) will respond to the client with either the requested resource record or an error message if the resolver can't find the record.\n\n### Iterative\n\nIn an iterative query, a DNS client provides a hostname, and the DNS Resolver returns the best answer it can. If the DNS resolver has the relevant DNS records in its cache, it returns them. If not, it refers the DNS client to the Root Server or another Authoritative Name Server that is nearest to the required DNS zone. The DNS client must then repeat the query directly against the DNS server it was referred.\n\n### Non-recursive\n\nA non-recursive query is a query in which the DNS Resolver already knows the answer. It either immediately returns a DNS record because it already stores it in a local cache, or queries a DNS Name Server which is authoritative for the record, meaning it definitely holds the correct IP for that hostname. In both cases, there is no need for additional rounds of queries (like in recursive or iterative queries). Rather, a response is immediately returned to the client.\n\n## Record Types\n\nDNS records (aka zone files) are instructions that live in authoritative DNS servers and provide information about a domain including what IP address is associated with that domain and how to handle requests for that domain.\n\nThese records consist of a series of text files written in what is known as _DNS syntax_. DNS syntax is just a string of characters used as commands that tell the DNS server what to do. All DNS records also have a _\"TTL\"_, which stands for time-to-live, and indicates how often a DNS server will refresh that record.\n\nThere are more record types but for now, let's look at some of the most commonly used ones:\n\n- **A (Address record)**: This is the record that holds the IP address of a domain.\n- **AAAA (IP Version 6 Address record)**: The record that contains the IPv6 address for a domain (as opposed to A records, which stores the IPv4 address).\n- **CNAME (Canonical Name record)**: Forwards one domain or subdomain to another domain, does NOT provide an IP address.\n- **MX (Mail exchanger record)**: Directs mail to an email server.\n- **TXT (Text Record)**: This record lets an admin store text notes in the record. These records are often used for email security.\n- **NS (Name Server records)**: Stores the name server for a DNS entry.\n- **SOA (Start of Authority)**: Stores admin information about a domain.\n- **SRV (Service Location record)**: Specifies a port for specific services.\n- **PTR (Reverse-lookup Pointer record)**: Provides a domain name in reverse lookups.\n- **CERT (Certificate record)**: Stores public key certificates.\n\n## Subdomains\n\nA subdomain is an additional part of our main domain name. It is commonly used to logically separate a website into sections. We can create multiple subdomains or child domains on the main domain.\n\nFor example, `blog.example.com` where `blog` is the subdomain, `example` is the primary domain and `.com` is the top-level domain (TLD). Similar examples can be `support.example.com` or `careers.example.com`.\n\n## DNS Zones\n\nA DNS zone is a distinct part of the domain namespace which is delegated to a legal entity like a person, organization, or company, who is responsible for maintaining the DNS zone. A DNS zone is also an administrative function, allowing for granular control of DNS components, such as authoritative name servers.\n\n## DNS Caching\n\nA DNS cache (sometimes called a DNS resolver cache) is a temporary database, maintained by a computer's operating system, that contains records of all the recent visits and attempted visits to websites and other internet domains. In other words, a DNS cache is just a memory of recent DNS lookups that our computer can quickly refer to when it's trying to figure out how to load a website.\n\nThe Domain Name System implements a time-to-live (TTL) on every DNS record. TTL specifies the number of seconds the record can be cached by a DNS client or server. When the record is stored in a cache, whatever TTL value came with it gets stored as well. The server continues to update the TTL of the record stored in the cache, counting down every second. When it hits zero, the record is deleted or purged from the cache. At that point, if a query for that record is received, the DNS server has to start the resolution process.\n\n## Reverse DNS\n\nA reverse DNS lookup is a DNS query for the domain name associated with a given IP address. This accomplishes the opposite of the more commonly used forward DNS lookup, in which the DNS system is queried to return an IP address. The process of reverse resolving an IP address uses PTR records. If the server does not have a PTR record, it cannot resolve a reverse lookup.\n\nReverse lookups are commonly used by email servers. Email servers check and see if an email message came from a valid server before bringing it onto their network. Many email servers will reject messages from any server that does not support reverse lookups or from a server that is highly unlikely to be legitimate.\n\n_Note: Reverse DNS lookups are not universally adopted as they are not critical to the normal function of the internet._\n\n## Examples\n\nThese are some widely used managed DNS solutions:\n\n- [Route53](https:\u002F\u002Faws.amazon.com\u002Froute53)\n- [Cloudflare DNS](https:\u002F\u002Fwww.cloudflare.com\u002Fdns)\n- [Google Cloud DNS](https:\u002F\u002Fcloud.google.com\u002Fdns)\n- [Azure DNS](https:\u002F\u002Fazure.microsoft.com\u002Fen-in\u002Fservices\u002Fdns)\n- [NS1](https:\u002F\u002Fns1.com\u002Fproducts\u002Fmanaged-dns)\n\n# Load Balancing\n\nLoad balancing lets us distribute incoming network traffic across multiple resources ensuring high availability and reliability by sending requests only to resources that are online. This provides the flexibility to add or subtract resources as demand dictates.\n\n![load-balancing](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fload-balancing\u002Fload-balancer.png)\n\nFor additional scalability and redundancy, we can try to load balance at each layer of our system:\n\n![load-balancing-layers](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fload-balancing\u002Fload-balancer-layers.png)\n\n## But why?\n\nModern high-traffic websites must serve hundreds of thousands, if not millions, of concurrent requests from users or clients. To cost-effectively scale to meet these high volumes, modern computing best practice generally requires adding more servers.\n\nA load balancer can sit in front of the servers and route client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization. This ensures that no single server is overworked, which could degrade performance. If a single server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added to the server group, the load balancer automatically starts sending requests to it.\n\n## Workload distribution\n\nThis is the core functionality provided by a load balancer and has several common variations:\n\n- **Host-based**: Distributes requests based on the requested hostname.\n- **Path-based**: Using the entire URL to distribute requests as opposed to just the hostname.\n- **Content-based**: Inspects the message content of a request. This allows distribution based on content such as the value of a parameter.\n\n## Layers\n\nGenerally speaking, load balancers operate at one of the two levels:\n\n### Network layer\n\nThis is the load balancer that works at the network's transport layer, also known as layer 4. This performs routing based on networking information such as IP addresses and is not able to perform content-based routing. These are often dedicated hardware devices that can operate at high speed.\n\n### Application layer\n\nThis is the load balancer that operates at the application layer, also known as layer 7. Load balancers can read requests in their entirety and perform content-based routing. This allows the management of load based on a full understanding of traffic.\n\n## Types\n\nLet's look at different types of load balancers:\n\n### Software\n\nSoftware load balancers usually are easier to deploy than hardware versions. They also tend to be more cost-effective and flexible, and they are used in conjunction with software development environments. The software approach gives us the flexibility of configuring the load balancer to our environment's specific needs. The boost in flexibility may come at the cost of having to do more work to set up the load balancer. Compared to hardware versions, which offer more of a closed-box approach, software balancers give us more freedom to make changes and upgrades.\n\nSoftware load balancers are widely used and are available either as installable solutions that require configuration and management or as a managed cloud service.\n\n### Hardware\n\nAs the name implies, a hardware load balancer relies on physical, on-premises hardware to distribute application and network traffic. These devices can handle a large volume of traffic but often carry a hefty price tag and are fairly limited in terms of flexibility.\n\nHardware load balancers include proprietary firmware that requires maintenance and updates as new versions, and security patches are released.\n\n### DNS\n\nDNS load balancing is the practice of configuring a domain in the Domain Name System (DNS) such that client requests to the domain are distributed across a group of server machines.\n\nUnfortunately, DNS load balancing has inherent problems limiting its reliability and efficiency. Most significantly, DNS does not check for server and network outages, or errors. It always returns the same set of IP addresses for a domain even if servers are down or inaccessible.\n\n## Routing Algorithms\n\nNow, let's discuss commonly used routing algorithms:\n\n- **Round-robin**: Requests are distributed to application servers in rotation.\n- **Weighted Round-robin**: Builds on the simple Round-robin technique to account for differing server characteristics such as compute and traffic handling capacity using weights that can be assigned via DNS records by the administrator.\n- **Least Connections**: A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections.\n- **Least Response Time**: Sends requests to the server selected by a formula that combines the fastest response time and fewest active connections.\n- **Least Bandwidth**: This method measures traffic in megabits per second (Mbps), sending client requests to the server with the least Mbps of traffic.\n- **Hashing**: Distributes requests based on a key we define, such as the client IP address or the request URL.\n\n## Advantages\n\nLoad balancing also plays a key role in preventing downtime, other advantages of load balancing include the following:\n\n- Scalability\n- Redundancy\n- Flexibility\n- Efficiency\n\n## Redundant load balancers\n\nAs you must've already guessed, the load balancer itself can be a single point of failure. To overcome this, a second or `N` number of load balancers can be used in a cluster mode.\n\nAnd, if there's a failure detection and the _active_ load balancer fails, another _passive_ load balancer can take over which will make our system more fault-tolerant.\n\n![redundant-load-balancing](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fload-balancing\u002Fredundant-load-balancer.png)\n\n## Features\n\nHere are some commonly desired features of load balancers:\n\n- **Autoscaling**: Starting up and shutting down resources in response to demand conditions.\n- **Sticky sessions**: The ability to assign the same user or device to the same resource in order to maintain the session state on the resource.\n- **Healthchecks**: The ability to determine if a resource is down or performing poorly in order to remove the resource from the load balancing pool.\n- **Persistence connections**: Allowing a server to open a persistent connection with a client such as a WebSocket.\n- **Encryption**: Handling encrypted connections such as TLS and SSL.\n- **Certificates**: Presenting certificates to a client and authentication of client certificates.\n- **Compression**: Compression of responses.\n- **Caching**: An application-layer load balancer may offer the ability to cache responses.\n- **Logging**: Logging of request and response metadata can serve as an important audit trail or source for analytics data.\n- **Request tracing**: Assigning each request a unique id for the purposes of logging, monitoring, and troubleshooting.\n- **Redirects**: The ability to redirect an incoming request based on factors such as the requested path.\n- **Fixed response**: Returning a static response for a request such as an error message.\n\n## Examples\n\nFollowing are some of the load balancing solutions commonly used in the industry:\n\n- [Amazon Elastic Load Balancing](https:\u002F\u002Faws.amazon.com\u002Felasticloadbalancing)\n- [Azure Load Balancing](https:\u002F\u002Fazure.microsoft.com\u002Fen-in\u002Fservices\u002Fload-balancer)\n- [GCP Load Balancing](https:\u002F\u002Fcloud.google.com\u002Fload-balancing)\n- [DigitalOcean Load Balancer](https:\u002F\u002Fwww.digitalocean.com\u002Fproducts\u002Fload-balancer)\n- [Nginx](https:\u002F\u002Fwww.nginx.com)\n- [HAProxy](http:\u002F\u002Fwww.haproxy.org)\n\n# Clustering\n\nAt a high level, a computer cluster is a group of two or more computers, or nodes, that run in parallel to achieve a common goal. This allows workloads consisting of a high number of individual, parallelizable tasks to be distributed among the nodes in the cluster. As a result, these tasks can leverage the combined memory and processing power of each computer to increase overall performance.\n\nTo build a computer cluster, the individual nodes should be connected to a network to enable internode communication. The software can then be used to join the nodes together and form a cluster. It may have a shared storage device and\u002For local storage on each node.\n\n![cluster](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fclustering\u002Fcluster.png)\n\nTypically, at least one node is designated as the leader node and acts as the entry point to the cluster. The leader node may be responsible for delegating incoming work to the other nodes and, if necessary, aggregating the results and returning a response to the user.\n\nIdeally, a cluster functions as if it were a single system. A user accessing the cluster should not need to know whether the system is a cluster or an individual machine. Furthermore, a cluster should be designed to minimize latency and prevent bottlenecks in node-to-node communication.\n\n## Types\n\nComputer clusters can generally be categorized into three types:\n\n- Highly available or fail-over\n- Load balancing\n- High-performance computing\n\n## Configurations\n\nThe two most commonly used high availability (HA) clustering configurations are active-active and active-passive.\n\n### Active-Active\n\n![active-active](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fclustering\u002Factive-active.png)\n\nAn active-active cluster is typically made up of at least two nodes, both actively running the same kind of service simultaneously. The main purpose of an active-active cluster is to achieve load balancing. A load balancer distributes workloads across all nodes to prevent any single node from getting overloaded. Because there are more nodes available to serve, there will also be an improvement in throughput and response times.\n\n### Active-Passive\n\n![active-passive](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fclustering\u002Factive-passive.png)\n\nLike the active-active cluster configuration, an active-passive cluster also consists of at least two nodes. However, as the name _active-passive_ implies, not all nodes are going to be active. For example, in the case of two nodes, if the first node is already active, then the second node must be passive or on standby.\n\n## Advantages\n\nFour key advantages of cluster computing are as follows:\n\n- High availability\n- Scalability\n- Performance\n- Cost-effective\n\n## Load balancing vs Clustering\n\nLoad balancing shares some common traits with clustering, but they are different processes. Clustering provides redundancy and boosts capacity and availability. Servers in a cluster are aware of each other and work together toward a common purpose. But with load balancing, servers are not aware of each other. Instead, they react to the requests they receive from the load balancer.\n\nWe can employ load balancing in conjunction with clustering, but it also is applicable in cases involving independent servers that share a common purpose such as to run a website, business application, web service, or some other IT resource.\n\n## Challenges\n\nThe most obvious challenge clustering presents is the increased complexity of installation and maintenance. An operating system, the application, and its dependencies must each be installed and updated on every node.\n\nThis becomes even more complicated if the nodes in the cluster are not homogeneous. Resource utilization for each node must also be closely monitored, and logs should be aggregated to ensure that the software is behaving correctly.\n\nAdditionally, storage becomes more difficult to manage, a shared storage device must prevent nodes from overwriting one another and distributed data stores have to be kept in sync.\n\n## Examples\n\nClustering is commonly used in the industry, and often many technologies offer some sort of clustering mode. For example:\n\n- Containers (e.g. [Kubernetes](https:\u002F\u002Fkubernetes.io), [Amazon ECS](https:\u002F\u002Faws.amazon.com\u002Fecs))\n- Databases (e.g. [Cassandra](https:\u002F\u002Fcassandra.apache.org\u002F_\u002Findex.html), [MongoDB](https:\u002F\u002Fwww.mongodb.com))\n- Cache (e.g. [Redis](https:\u002F\u002Fredis.io\u002Fdocs\u002Fmanual\u002Fscaling))\n\n# Caching\n\n_\"There are only two hard things in Computer Science: cache invalidation and naming things.\" - Phil Karlton_\n\n![caching](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fcaching\u002Fcaching.png)\n\nA cache's primary purpose is to increase data retrieval performance by reducing the need to access the underlying slower storage layer. Trading off capacity for speed, a cache typically stores a subset of data transiently, in contrast to databases whose data is usually complete and durable.\n\nCaches take advantage of the locality of reference principle _\"recently requested data is likely to be requested again\"._\n\n## Caching and Memory\n\nLike a computer's memory, a cache is a compact, fast-performing memory that stores data in a hierarchy of levels, starting at level one, and progressing from there sequentially. They are labeled as L1, L2, L3, and so on. A cache also gets written if requested, such as when there has been an update and new content needs to be saved to the cache, replacing the older content that was saved.\n\nNo matter whether the cache is read or written, it's done one block at a time. Each block also has a tag that includes the location where the data was stored in the cache. When data is requested from the cache, a search occurs through the tags to find the specific content that's needed in level one (L1) of the memory. If the correct data isn't found, more searches are conducted in L2.\n\nIf the data isn't found there, searches are continued in L3, then L4, and so on until it has been found, then, it's read and loaded. If the data isn't found in the cache at all, then it's written into it for quick retrieval the next time.\n\n## Cache hit and Cache miss\n\n### Cache hit\n\nA cache hit describes the situation where content is successfully served from the cache. The tags are searched in the memory rapidly, and when the data is found and read, it's considered a cache hit.\n\n**Cold, Warm, and Hot Caches**\n\nA cache hit can also be described as cold, warm, or hot. In each of these, the speed at which the data is read is described.\n\nA hot cache is an instance where data was read from the memory at the _fastest_ possible rate. This happens when the data is retrieved from L1.\n\nA cold cache is the _slowest_ possible rate for data to be read, though, it's still successful so it's still considered a cache hit. The data is just found lower in the memory hierarchy such as in L3, or lower.\n\nA warm cache is used to describe data that's found in L2 or L3. It's not as fast as a hot cache, but it's still faster than a cold cache. Generally, calling a cache warm is used to express that it's slower and closer to a cold cache than a hot one.\n\n### Cache miss\n\nA cache miss refers to the instance when the memory is searched, and the data isn't found. When this happens, the content is transferred and written into the cache.\n\n## Cache Invalidation\n\nCache invalidation is a process where the computer system declares the cache entries as invalid and removes or replaces them. If the data is modified, it should be invalidated in the cache, if not, this can cause inconsistent application behavior. There are three kinds of caching systems:\n\n### Write-through cache\n\n![write-through-cache](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fcaching\u002Fwrite-through-cache.png)\n\nData is written into the cache and the corresponding database simultaneously.\n\n**Pro**: Fast retrieval, complete data consistency between cache and storage.\n\n**Con**: Higher latency for write operations.\n\n### Write-around cache\n\n![write-around-cache](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fcaching\u002Fwrite-around-cache.png)\n\nWhere write directly goes to the database or permanent storage, bypassing the cache.\n\n**Pro**: This may reduce latency.\n\n**Con**: It increases cache misses because the cache system has to read the information from the database in case of a cache miss. As a result, this can lead to higher read latency in the case of applications that write and re-read the information quickly. Read happen from slower back-end storage and experiences higher latency.\n\n### Write-back cache\n\n![write-back-cache](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fcaching\u002Fwrite-back-cache.png)\n\nWhere the write is only done to the caching layer and the write is confirmed as soon as the write to the cache completes. The cache then asynchronously syncs this write to the database.\n\n**Pro**: This would lead to reduced latency and high throughput for write-intensive applications.\n\n**Con**: There is a risk of data loss in case the caching layer crashes. We can improve this by having more than one replica acknowledging the write in the cache.\n\n## Eviction policies\n\nFollowing are some of the most common cache eviction policies:\n\n- **First In First Out (FIFO)**: The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before.\n- **Last In First Out (LIFO)**: The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before.\n- **Least Recently Used (LRU)**: Discards the least recently used items first.\n- **Most Recently Used (MRU)**: Discards, in contrast to LRU, the most recently used items first.\n- **Least Frequently Used (LFU)**: Counts how often an item is needed. Those that are used least often are discarded first.\n- **Random Replacement (RR)**: Randomly selects a candidate item and discards it to make space when necessary.\n\n## Distributed Cache\n\n![distributed-cache](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fcaching\u002Fdistributed-cache.png)\n\nA distributed cache is a system that pools together the random-access memory (RAM) of multiple networked computers into a single in-memory data store used as a data cache to provide fast access to data. While most caches are traditionally in one physical server or hardware component, a distributed cache can grow beyond the memory limits of a single computer by linking together multiple computers.\n\n## Global Cache\n\n![global-cache](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fcaching\u002Fglobal-cache.png)\n\nAs the name suggests, we will have a single shared cache that all the application nodes will use. When the requested data is not found in the global cache, it's the responsibility of the cache to find out the missing piece of data from the underlying data store.\n\n## Use cases\n\nCaching can have many real-world use cases such as:\n\n- Database Caching\n- Content Delivery Network (CDN)\n- Domain Name System (DNS) Caching\n- API Caching\n\n**When not to use caching?**\n\nLet's also look at some scenarios where we should not use cache:\n\n- Caching isn't helpful when it takes just as long to access the cache as it does to access the primary data store.\n- Caching doesn't work as well when requests have low repetition (higher randomness), because caching performance comes from repeated memory access patterns.\n- Caching isn't helpful when the data changes frequently, as the cached version gets out of sync, and the primary data store must be accessed every time.\n\n_It's important to note that a cache should not be used as permanent data storage. They are almost always implemented in volatile memory because it is faster, and thus should be considered transient._\n\n## Advantages\n\nBelow are some advantages of caching:\n\n- Improves performance\n- Reduce latency\n- Reduce load on the database\n- Reduce network cost\n- Increase Read Throughput\n\n## Examples\n\nHere are some commonly used technologies for caching:\n\n- [Redis](https:\u002F\u002Fredis.io)\n- [Memcached](https:\u002F\u002Fmemcached.org)\n- [Amazon Elasticache](https:\u002F\u002Faws.amazon.com\u002Felasticache)\n- [Aerospike](https:\u002F\u002Faerospike.com)\n\n# Content Delivery Network (CDN)\n\nA content delivery network (CDN) is a geographically distributed group of servers that work together to provide fast delivery of internet content. Generally, static files such as HTML\u002FCSS\u002FJS, photos, and videos are served from CDN.\n\n![cdn-map](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fcontent-delivery-network\u002Fcdn-map.png)\n\n## Why use a CDN?\n\nContent Delivery Network (CDN) increases content availability and redundancy while reducing bandwidth costs and improving security. Serving content from CDNs can significantly improve performance as users receive content from data centers close to them and our servers do not have to serve requests that the CDN fulfills.\n\n## How does a CDN work?\n\n![cdn](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fcontent-delivery-network\u002Fcdn.png)\n\nIn a CDN, the origin server contains the original versions of the content while the edge servers are numerous and distributed across various locations around the world.\n\nTo minimize the distance between the visitors and the website's server, a CDN stores a cached version of its content in multiple geographical locations known as edge locations. Each edge location contains several caching servers responsible for content delivery to visitors within its proximity.\n\nOnce the static assets are cached on all the CDN servers for a particular location, all subsequent website visitor requests for static assets will be delivered from these edge servers instead of the origin, thus reducing the origin load and improving scalability.\n\nFor example, when someone in the UK requests our website which might be hosted in the USA, they will be served from the closest edge location such as the London edge location. This is much quicker than having the visitor make a complete request to the origin server which will increase the latency.\n\n## Types\n\nCDNs are generally divided into two types:\n\n### Push CDNs\n\nPush CDNs receive new content whenever changes occur on the server. We take full responsibility for providing content, uploading directly to the CDN, and rewriting URLs to point to the CDN. We can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage.\n\nSites with a small amount of traffic or sites with content that isn't often updated work well with push CDNs. Content is placed on the CDNs once, instead of being re-pulled at regular intervals.\n\n### Pull CDNs\n\nIn a Pull CDN situation, the cache is updated based on request. When the client sends a request that requires static assets to be fetched from the CDN if the CDN doesn't have it, then it will fetch the newly updated assets from the origin server and populate its cache with this new asset, and then send this new cached asset to the user.\n\nContrary to the Push CDN, this requires less maintenance because cache updates on CDN nodes are performed based on requests from the client to the origin server. Sites with heavy traffic work well with pull CDNs, as traffic is spread out more evenly with only recently-requested content remaining on the CDN.\n\n## Disadvantages\n\nAs we all know good things come with extra costs, so let's discuss some disadvantages of CDNs:\n\n- **Extra charges**: It can be expensive to use a CDN, especially for high-traffic services.\n- **Restrictions**: Some organizations and countries have blocked the domains or IP addresses of popular CDNs.\n- **Location**: If most of our audience is located in a country where the CDN has no servers, the data on our website may have to travel further than without using any CDN.\n\n## Examples\n\nHere are some widely used CDNs:\n\n- [Amazon CloudFront](https:\u002F\u002Faws.amazon.com\u002Fcloudfront)\n- [Google Cloud CDN](https:\u002F\u002Fcloud.google.com\u002Fcdn)\n- [Cloudflare CDN](https:\u002F\u002Fwww.cloudflare.com\u002Fcdn)\n- [Fastly](https:\u002F\u002Fwww.fastly.com\u002Fproducts\u002Fcdn)\n\n# Proxy\n\nA proxy server is an intermediary piece of hardware\u002Fsoftware sitting between the client and the backend server. It receives requests from clients and relays them to the origin servers. Typically, proxies are used to filter requests, log requests, or sometimes transform requests (by adding\u002Fremoving headers, encrypting\u002Fdecrypting, or compression).\n\n## Types\n\nThere are two types of proxies:\n\n### Forward Proxy\n\nA forward proxy, often called a proxy, proxy server, or web proxy is a server that sits in front of a group of client machines. When those computers make requests to sites and services on the internet, the proxy server intercepts those requests and then communicates with web servers on behalf of those clients, like a middleman.\n\n![forward-proxy](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fproxy\u002Fforward-proxy.png)\n\n**Advantages**\n\nHere are some advantages of a forward proxy:\n\n- Block access to certain content\n- Allows access to [geo-restricted](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FGeo-blocking) content\n- Provides anonymity\n- Avoid other browsing restrictions\n\nAlthough proxies provide the benefits of anonymity, they can still track our personal information. Setup and maintenance of a proxy server can be costly and requires configurations.\n\n### Reverse Proxy\n\nA reverse proxy is a server that sits in front of one or more web servers, intercepting requests from clients. When clients send requests to the origin server of a website, those requests are intercepted by the reverse proxy server.\n\nThe difference between a forward and reverse proxy is subtle but important. A simplified way to sum it up would be to say that a forward proxy sits in front of a client and ensures that no origin server ever communicates directly with that specific client. On the other hand, a reverse proxy sits in front of an origin server and ensures that no client ever communicates directly with that origin server.\n\n![reverse-proxy](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fproxy\u002Freverse-proxy.png)\n\nIntroducing reverse proxy results in increased complexity. A single reverse proxy is a single point of failure, configuring multiple reverse proxies (i.e. a failover) further increases complexity.\n\n**Advantages**\n\nHere are some advantages of using a reverse proxy:\n\n- Improved security\n- Caching\n- SSL encryption\n- Load balancing\n- Scalability and flexibility\n\n## Load balancer vs Reverse Proxy\n\nWait, isn't reverse proxy similar to a load balancer? Well, no as a load balancer is useful when we have multiple servers. Often, load balancers route traffic to a set of servers serving the same function, while reverse proxies can be useful even with just one web server or application server. A reverse proxy can also act as a load balancer but not the other way around.\n\n## Examples\n\nBelow are some commonly used proxy technologies:\n\n- [Nginx](https:\u002F\u002Fwww.nginx.com)\n- [HAProxy](http:\u002F\u002Fwww.haproxy.org)\n- [Traefik](https:\u002F\u002Fdoc.traefik.io\u002Ftraefik)\n- [Envoy](https:\u002F\u002Fwww.envoyproxy.io)\n\n# Availability\n\nAvailability is the time a system remains operational to perform its required function in a specific period. It is a simple measure of the percentage of time that a system, service, or machine remains operational under normal conditions.\n\n## The Nine's of availability\n\nAvailability is often quantified by uptime (or downtime) as a percentage of time the service is available. It is generally measured in the number of 9s.\n\n$$\nAvailability = \\frac{Uptime}{(Uptime + Downtime)}\n$$\n\nIf availability is 99.00% available, it is said to have \"2 nines\" of availability, and if it is 99.9%, it is called \"3 nines\", and so on.\n\n| Availability (Percent)   | Downtime (Year)    | Downtime (Month)  | Downtime (Week)    |\n| ------------------------ | ------------------ | ----------------- | ------------------ |\n| 90% (one nine)           | 36.53 days         | 72 hours          | 16.8 hours         |\n| 99% (two nines)          | 3.65 days          | 7.20 hours        | 1.68 hours         |\n| 99.9% (three nines)      | 8.77 hours         | 43.8 minutes      | 10.1 minutes       |\n| 99.99% (four nines)      | 52.6 minutes       | 4.32 minutes      | 1.01 minutes       |\n| 99.999% (five nines)     | 5.25 minutes       | 25.9 seconds      | 6.05 seconds       |\n| 99.9999% (six nines)     | 31.56 seconds      | 2.59 seconds      | 604.8 milliseconds |\n| 99.99999% (seven nines)  | 3.15 seconds       | 263 milliseconds  | 60.5 milliseconds  |\n| 99.999999% (eight nines) | 315.6 milliseconds | 26.3 milliseconds | 6 milliseconds     |\n| 99.9999999% (nine nines) | 31.6 milliseconds  | 2.6 milliseconds  | 0.6 milliseconds   |\n\n## Availability in Sequence vs Parallel\n\nIf a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.\n\n### Sequence\n\nOverall availability decreases when two components are in sequence.\n\n$$\nAvailability \\space (Total) = Availability \\space (Foo) * Availability \\space (Bar)\n$$\n\nFor example, if both `Foo` and `Bar` each had 99.9% availability, their total availability in sequence would be 99.8%.\n\n### Parallel\n\nOverall availability increases when two components are in parallel.\n\n$$\nAvailability \\space (Total) = 1 - (1 - Availability \\space (Foo)) * (1 - Availability \\space (Bar))\n$$\n\nFor example, if both `Foo` and `Bar` each had 99.9% availability, their total availability in parallel would be 99.9999%.\n\n## Availability vs Reliability\n\nIf a system is reliable, it is available. However, if it is available, it is not necessarily reliable. In other words, high reliability contributes to high availability, but it is possible to achieve high availability even with an unreliable system.\n\n## High availability vs Fault Tolerance\n\nBoth high availability and fault tolerance apply to methods for providing high uptime levels. However, they accomplish the objective differently.\n\nA fault-tolerant system has no service interruption but a significantly higher cost, while a highly available system has minimal service interruption. Fault-tolerance requires full hardware redundancy as if the main system fails, with no loss in uptime, another system should take over.\n\n# Scalability\n\nScalability is the measure of how well a system responds to changes by adding or removing resources to meet demands.\n\n![scalability](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-I\u002Fscalability\u002Fscalability.png)\n\nLet's discuss different types of scaling:\n\n## Vertical scaling\n\nVertical scaling (also known as scaling up) expands a system's scalability by adding more power to an existing machine. In other words, vertical scaling refers to improving an application's capability via increasing hardware capacity.\n\n### Advantages\n\n- Simple to implement\n- Easier to manage\n- Data consistent\n\n### Disadvantages\n\n- Risk of high downtime\n- Harder to upgrade\n- Can be a single point of failure\n\n## Horizontal scaling\n\nHorizontal scaling (also known as scaling out) expands a system's scale by adding more machines. It improves the performance of the server by adding more instances to the existing pool of servers, allowing the load to be distributed more evenly.\n\n### Advantages\n\n- Increased redundancy\n- Better fault tolerance\n- Flexible and efficient\n- Easier to upgrade\n\n### Disadvantages\n\n- Increased complexity\n- Data inconsistency\n- Increased load on downstream services\n\n# Storage\n\nStorage is a mechanism that enables a system to retain data, either temporarily or permanently. This topic is mostly skipped over in the context of system design, however, it is important to have a basic understanding of some common types of storage techniques that can help us fine-tune our storage components. Let's discuss some important storage concepts:\n\n## RAID\n\nRAID (Redundant Array of Independent Disks) is a way of storing the same data on multiple hard disks or solid-state drives (SSDs) to protect data in the case of a drive failure.\n\nThere are different RAID levels, however, and not all have the goal of providing redundancy. Let's discuss some commonly used RAID levels:\n\n- **RAID 0**: Also known as striping, data is split evenly across all the drives in the array.\n- **RAID 1**: Also known as mirroring, at least two drives contains the exact copy of a set of data. If a drive fails, others will still work.\n- **RAID 5**: Striping with parity. Requires the use of at least 3 drives, striping the data across multiple drives like RAID 0, but also has a parity distributed across the drives.\n- **RAID 6**: Striping with double parity. RAID 6 is like RAID 5, but the parity data are written to two drives.\n- **RAID 10**: Combines striping plus mirroring from RAID 0 and RAID 1. It provides security by mirroring all data on secondary drives while using striping across each set of drives to speed up data transfers.\n\n### Comparison\n\nLet's compare all the features of different RAID levels:\n\n| Features             | RAID 0   | RAID 1               | RAID 5               | RAID 6                      | RAID 10                                  |\n| -------------------- | -------- | -------------------- | -------------------- | --------------------------- | ---------------------------------------- |\n| Description          | Striping | Mirroring            | Striping with Parity | Striping with double parity | Striping and Mirroring                   |\n| Minimum Disks        | 2        | 2                    | 3                    | 4                           | 4                                        |\n| Read Performance     | High     | High                 | High                 | High                        | High                                     |\n| Write Performance    | High     | Medium               | High                 | High                        | Medium                                   |\n| Cost                 | Low      | High                 | Low                  | Low                         | High                                     |\n| Fault Tolerance      | None     | Single-drive failure | Single-drive failure | Two-drive failure           | Up to one disk failure in each sub-array |\n| Capacity Utilization | 100%     | 50%                  | 67%-94%              | 50%-80%                     | 50%                                      |\n\n## Volumes\n\nVolume is a fixed amount of storage on a disk or tape. The term volume is often used as a synonym for the storage itself, but it is possible for a single disk to contain more than one volume or a volume to span more than one disk.\n\n## File storage\n\nFile storage is a solution to store data as files and present it to its final users as a hierarchical directories structure. The main advantage is to provide a user-friendly solution to store and retrieve files. To locate a file in file storage, the complete path of the file is required. It is economical and easily structured and is usually found on hard drives, which means that they appear exactly the same for the user and on the hard drive.\n\nExample: [Amazon EFS](https:\u002F\u002Faws.amazon.com\u002Fefs), [Azure files](https:\u002F\u002Fazure.microsoft.com\u002Fen-in\u002Fservices\u002Fstorage\u002Ffiles), [Google Cloud Filestore](https:\u002F\u002Fcloud.google.com\u002Ffilestore), etc.\n\n## Block storage\n\nBlock storage divides data into blocks (chunks) and stores them as separate pieces. Each block of data is given a unique identifier, which allows a storage system to place the smaller pieces of data wherever it is most convenient.\n\nBlock storage also decouples data from user environments, allowing that data to be spread across multiple environments. This creates multiple paths to the data and allows the user to retrieve it quickly. When a user or application requests data from a block storage system, the underlying storage system reassembles the data blocks and presents the data to the user or application\n\nExample: [Amazon EBS](https:\u002F\u002Faws.amazon.com\u002Febs).\n\n## Object Storage\n\nObject storage, which is also known as object-based storage, breaks data files up into pieces called objects. It then stores those objects in a single repository, which can be spread out across multiple networked systems.\n\nExample: [Amazon S3](https:\u002F\u002Faws.amazon.com\u002Fs3), [Azure Blob Storage](https:\u002F\u002Fazure.microsoft.com\u002Fen-in\u002Fservices\u002Fstorage\u002Fblobs), [Google Cloud Storage](https:\u002F\u002Fcloud.google.com\u002Fstorage), etc.\n\n## NAS\n\nA NAS (Network Attached Storage) is a storage device connected to a network that allows storage and retrieval of data from a central location for authorized network users. NAS devices are flexible, meaning that as we need additional storage, we can add to what we have. It's faster, less expensive, and provides all the benefits of a public cloud on-site, giving us complete control.\n\n## HDFS\n\nThe Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. It has many similarities with existing distributed file systems.\n\nHDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks, all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance.\n\n# Databases and DBMS\n\n## What is a Database?\n\nA database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a Database Management System (DBMS). Together, the data and the DBMS, along with the applications that are associated with them, are referred to as a database system, often shortened to just database.\n\n## What is DBMS?\n\nA database typically requires a comprehensive database software program known as a Database Management System (DBMS). A DBMS serves as an interface between the database and its end-users or programs, allowing users to retrieve, update, and manage how the information is organized and optimized. A DBMS also facilitates oversight and control of databases, enabling a variety of administrative operations such as performance monitoring, tuning, and backup and recovery.\n\n## Components\n\nHere are some common components found across different databases:\n\n### Schema\n\nThe role of a schema is to define the shape of a data structure, and specify what kinds of data can go where. Schemas can be strictly enforced across the entire database, loosely enforced on part of the database, or they might not exist at all.\n\n### Table\n\nEach table contains various columns just like in a spreadsheet. A table can have as meager as two columns and upwards of a hundred or more columns, depending upon the kind of information being put in the table.\n\n### Column\n\nA column contains a set of data values of a particular type, one value for each row of the database. A column may contain text values, numbers, enums, timestamps, etc.\n\n### Row\n\nData in a table is recorded in rows. There can be thousands or millions of rows in a table having any particular information.\n\n## Types\n\n![database-types](https:\u002F\u002Fraw.githubusercontent.com\u002Fkaranpratapsingh\u002Fportfolio\u002Fmaster\u002Fpublic\u002Fstatic\u002Fcourses\u002Fsystem-design\u002Fchapter-II\u002Fdatabases-and-dbms\u002Fdatabase-types.png)\n\nBelow are different types of databases:\n\n- **[SQL](https:\u002F\u002Fkaranpratapsingh.com\u002Fcourses\u002Fsystem-design\u002Fsql-databases)**\n- **[NoSQL](https:\u002F\u002Fkaranpratapsingh.com\u002Fcourses\u002Fsystem-design\u002Fnosql-databases)**\n  - Document\n  - Key-value\n  - Graph\n  - Timeseries\n  - Wide column\n  - Multi-model\n\nSQL and NoSQL databases are broad topics and will be discussed separately in [SQL databases](https:\u002F\u002Fkaranpratapsingh.com\u002Fcourses\u002Fsystem-design\u002Fsql-databases) and [NoSQL databases](https:\u002F\u002Fkaranpratapsingh.com\u002Fcourses\u002Fsystem-design\u002Fnosql-databases). Learn how they compare to each other in [SQL vs NoSQL databases](https:\u002F\u002Fkaranpratapsingh.com\u002Fcourses\u002Fsystem-design\u002Fsql-vs-nosql-databases).\n\n## Challenges\n\nSome common challenges faced while running databases at scale:\n\n- **Absorbing significant increases in data volume**: The explosion of data coming in from sensors, connected machines, and dozens of other sources.\n- **Ensuring data security**: Data breaches are happening everywhere these days, it's more important than ever to ensure that data is secure but also easily accessible to users.\n- **Keeping up with demand**: Companies need real-time access to their data to support timely decision-making and to take advantage of new opportunities.\n- **Managing and maintaining the database and infrastructure**: As databases become more complex and data volumes grow, companies are faced with the e","该项目旨在教授大规模系统设计的知识并帮助准备系统设计面试。它涵盖了从网络基础、数据库技术到微服务架构等多个领域的核心概念和技术，包括IP协议、负载均衡、缓存策略、分布式事务处理等关键主题。特别适合希望深入了解现代软件架构背后原理的工程师，以及正在为相关职位面试做准备的技术人员使用。通过理论与实际案例相结合的方式，学习者可以掌握如何构建可扩展且高效的分布式系统。","2026-06-11 02:41:22","top_all"]