Datadog Expands Its Watchdog AI Engine


The new AI/ML capabilities enable IT teams to detect, investigate and resolve application performance issues more quickly and reduce alert fatigue

Datadog announced two new capabilities for Watchdog, its AI engine: Log Anomaly Detection and Root Cause Analysis. Embedded across Datadog’s observability platform, Watchdog analyses billions of events and learns what “normal” behaviour looks like in order to proactively provide insight to users for anomalies they didn’t anticipate. The two new capabilities of Watchdog take this one step further.

Log Anomaly Detection automatically understands and baselines normal patterns in logs, and proactively discovers abnormalities such as new text patterns, meaningful changes in data volumes of existing patterns and error outliers. With this new capability, Datadog Log Management users are able to quickly see and address hidden issues before they turn into critical incidents.

“The constant challenge with AI is balancing alert volume. If the alert volume is too high, it may overload your monitoring systems and lead to alert fatigue; if it’s too low, you might miss something that could critically impact your business. Watchdog helps our teams focus on the signals that matter by surfacing events that typically aren’t caught by traditional monitors. Looking at Watchdog every morning helps me gain a better understanding of everything happening across our entire technology stack. With the help of Root Cause Analysis, we have all the vital information we need so that our teams are able to investigate and address business-critical issues quickly and efficiently,” said Brent Montague, Site Reliability Architect, Cvent.

Root Cause Analysis works with Datadog’s APM products to automatically identify causal relationships between symptoms of an issue across an organisation’s services. By doing so, it pinpoints the precise service where an issue originated. Additionally, this capability identifies the business impact of an issue when Datadog’s Real Using Monitoring (RUM) is deployed in the environment. This unique new capability often solves in minutes the problems of causality and real user impact, each of which often takes hours or days to solve with manual troubleshooting. Both Root Cause Analysis and Log Anomaly Detection require no additional configuration and are available to Datadog APM and Log Management users out of the box.