Resource Data

Amazon OpenSearch adds anomaly detection for historical data

Amazon OpenSearch recently introduced anomaly detection support for historical data. Machine learning-based functionality helps identify trends, patterns, and seasonality in OpenSearch data.

Tyler Ohlsen, software engineer at AWS, explains how customers can gain insights and take action to improve applications:

You can take advantage of anomaly detection to analyze large amounts of logs in different ways. Some analytical approaches require real-time detection, such as application monitoring, event detection, and fraud detection. Others involve analyzing past data to identify trends and patterns, isolate the root cause, and prevent them from happening again in the future.

Anomaly Detection automatically detects anomalies in near real-time using the Random Cut Forest (RCF) algorithm v2.0. The new feature introduces a unified flow into OpenSearch dashboards to use anomaly detection for real-time and historical analysis. The RCF algorithm models a sketch of the incoming data stream to calculate an anomaly score and a confidence score value for the incoming data points, the values ​​then being used to differentiate an anomaly from normal variations.


Originally created at Amazon for data streaming, new implementations of the RCF algorithm are now being developed for density estimation, imputation, and prediction. Matthew Wilson, Vice President and Distinguished Engineer at AWS, adds:

This Random Cut Forest algorithm is OpenSource software separate from OpenSearch. Learn about implementations in Java and Rust. You can also track the progress of improvements made to the 3.0 implementation.

Explaining how to serve analytics in OpenSearch, Sudipto Guha, Principal Scientist at AWS, and Joshua Tokle, Senior Software Engineer at Amazon, write:

Anomaly detection is a research problem par excellence. A typical use case is a high-cardinality data set, where some attribute divides the data into a large number of individual and potentially incomparable time series and one monitors each of these time series simultaneously. Anomalies are often only explainable in the context of past data specific to each time series.

The new feature is one of the enhancements to OpenSearch version 1.1, the Apache 2.0 licensed distribution of Elasticsearch that was created a year ago by AWS. The new release introduces bucket-level alerting, policies that evaluate aggregations, and cross-cluster replication (CCR), to deploy OpenSearch clusters across different servers, data centers, and regions.

The OpenSearch Anomaly Detection plugin and the OpenSearch Dashboards Anomaly Detection plugin are available on GitHub.