codingstuff.io
ExploreTutorialsProblemsCS Subjects
Get Started
ExploreTutorialsProblemsCS Subjects
Get Started
codingstuff.io

Master the art of building software through interactive tutorials, real-world problems, and guided projects.

Pune, Maharashtra, India

codingstuffmail@gmail.com

Product

  • Explore
  • Tutorials
  • Problems
  • CS Subjects

Company

  • About
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • Sitemap

© 2026 codingstuff.io. All rights reserved.

Built with ❤️ for developers everywhere

/
/
All Subjects
🏗️

System Design

24 chapters

1System Design Basics2Vertical vs Horizontal Scaling3CAP Theorem4Load Balancers & Algorithms5Proxy Servers (Forward & Reverse)6Caching Strategies & Eviction7Content Delivery Networks (CDNs)8Database Replication9Database Sharding & Partitioning10Database Scaling & Sharding11Consistent Hashing12Choosing Databases (SQL vs NoSQL)13Message Queues (Kafka, RabbitMQ)14Microservices Architecture15API Gateways16Rate Limiting Algorithms17Long Polling vs WebSockets vs SSE18Heartbeat & Health Checks19Bloom Filters & Probabilistic Data Structures20Leader Election in Distributed Systems21Event-Driven Architecture22Distributed Locking23Circuit Breaker Pattern24Case Study: Design URL Shortener
SubjectsSystem Design

Heartbeat & Health Checks

Updated 2026-05-03
3 min read

Heartbeat & Health Checks

In a distributed system with hundreds of servers, machines fail constantly. Hard drives crash, network cables get disconnected, and processes run out of memory. The system must quickly detect these failures and route traffic away from dead nodes.

1. Heartbeat Mechanism

A Heartbeat is a periodic signal sent between nodes to indicate that they are still alive and functioning.

How it works:

  • Each server in the cluster periodically sends a small "I'm alive" message (heartbeat) to a central monitoring service or to its peers.
  • If a node fails to send a heartbeat within a configured timeout (e.g., 3 consecutive missed heartbeats over 30 seconds), it is declared dead and removed from the active pool.

Types:

  • Push-based: Each node proactively sends heartbeats to the monitor. Simple but requires the monitor to track all nodes.
  • Pull-based: The monitor periodically polls each node. Simpler for the nodes but puts load on the monitor.
  • Gossip-based: Each node randomly contacts a few peers and exchanges health information. No central monitor needed. Extremely scalable. Used by Cassandra and Consul.

2. Health Checks

A Health Check is a more sophisticated mechanism, typically used by load balancers and container orchestrators (Kubernetes).

Types of Health Checks:

Liveness Check

Answers the question: "Is the process running?"

  • A simple TCP connection check or an HTTP GET to /healthz that returns 200 OK.
  • If the liveness check fails, Kubernetes will restart the container.

Readiness Check

Answers the question: "Is the process ready to accept traffic?"

  • A service might be running but still initializing (loading a large ML model, warming up caches, establishing database connections).
  • If the readiness check fails, the load balancer stops sending traffic to that instance but does not restart it.

Startup Check

Answers the question: "Has the process finished its initial startup?"

  • For slow-starting applications, this prevents the liveness check from killing the container during a legitimate long startup sequence.

3. Failure Detection Challenges

  • Network Partition vs Node Failure: If a monitoring node cannot reach Server B, is Server B dead, or is the network between them broken? Acting too aggressively (declaring nodes dead on a single missed heartbeat) can lead to unnecessary failovers. Acting too slowly means traffic continues to be sent to dead nodes.
  • The Phi Accrual Failure Detector: Instead of a binary "alive/dead" decision, this sophisticated algorithm (used by Akka and Cassandra) calculates a continuous suspicion level based on the historical arrival times of heartbeats, allowing for adaptive and accurate failure detection.


PreviousLong Polling vs WebSockets vs SSENextBloom Filters & Probabilistic Data Structures

Recommended Gear

Long Polling vs WebSockets vs SSEBloom Filters & Probabilistic Data Structures