How Datadog Built Monocle, a Custom Database for Billions of Metrics

This title was summarized by AI from the post below.

⚙️ How Datadog Built a Custom Database for Billions of Metrics Per Second Datadog’s monitoring platform handles an unimaginable scale of billions of data points flowing in every second from millions of servers. To keep up, Datadog built Monocle, a custom time-series database written in Rust - optimized for raw performance, reliability, and cost efficiency. Monocle runs on a thread-per-core model and uses an LSM-Tree storage design for extreme write throughput. At the architectural level, Datadog’s Metrics Platform splits data into two specialized systems: 1. A Long-Term Store for historical analytics 2. A Real-Time Store for live dashboards and alerts - serving 99% of queries Each incoming data point is first sent to Kafka, which powers data distribution across nodes, write-ahead logging for crash recovery, and automatic replication across availability zones for durability. Performance is maintained under heavy load through key systems: 1. Admission Control, which protects the cluster from overload 2. Cost-Based Scheduling, which prioritizes queries dynamically to maintain low latency 🔗 To see how these systems work together under real Datadog-scale load, read the full breakdown: https://lnkd.in/esMfUBPA Supported by our partners helping teams build and scale reliably: Datadog - Powering observability at scale. Download the On-Call Best Practices guide:  https://bit.ly/3Xxk0wZ SonarSource - Bridging the gap between AI-generated code and human-grade quality. Verify every line for security, maintainability, and trust: https://bit.ly/47VpU00

  • diagram

Great example of solving high ingestion and low latency challenges at a massive scale.

Like
Reply

Great Insights 👍

Like
Reply

great visualisation!

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories