The ANALYZE Statement

The analyze statement provides a suite of analytical operations over a bucket’s numeric fields. It does not modify any data; it computes and outputs results based on the current committed state of the bucket.

General syntax:

analyze <bucket> <field> <operation>;

All analyze operations are read-only and require no active transaction.

Stream Analysis

Stream analysis computes real-time metrics over a sliding window of the most recent n records, ordered by insertion sequence. It is designed for monitoring and observability workloads where only the most recent observations are relevant.

Syntax:

analyze <bucket> <field> stream(<window_size>);

Example:

analyze Sales revenue stream(10);
analyze Metrics value stream(24);

The output includes rolling statistics such as the mean, minimum, maximum, and standard deviation computed over the specified window of records.

Time Series Analysis

Time series analysis examines a numeric field across the full dataset, treating record order as a proxy for time. It produces trend detection, forecasting, and anomaly detection outputs.

Syntax:

analyze <bucket> <field> timeseries;

Example:

analyze Sales revenue timeseries;
analyze Metrics value timeseries;

The output typically includes a trend direction, projected future values based on the observed trajectory, and the identification of records whose values deviate significantly from the expected trend.

Descriptive Statistics

The statistics operation computes a comprehensive summary of a numeric field across all records in the bucket.

Syntax:

analyze <bucket> <field> statistics;

Example:

analyze Sales price statistics;
analyze Sales quantity statistics;
analyze Metrics value statistics;

Reported statistics include, but are not limited to, the count, mean, median, standard deviation, variance, minimum, maximum, and range of the specified field.

Correlation Analysis

Correlation analysis measures the linear relationship between two numeric fields across the records in a bucket. The result is a Pearson correlation coefficient in the range [-1, 1], where 1 indicates a perfect positive linear relationship, -1 a perfect negative linear relationship, and 0 no linear relationship.

Syntax:

analyze <bucket> <field_a> correlation(<field_b>);

Examples:

analyze Sales quantity correlation(revenue);
analyze Marketing ad_spend correlation(revenue);
analyze Marketing ad_spend correlation(impressions);

Percentile Analysis

Percentile analysis determines the value below which a given percentage of observations fall. Any integer percentile from 1 to 99 may be requested.

Syntax:

analyze <bucket> <field> percentile(<n>);

Examples:

analyze Sales revenue percentile(90);
analyze Sales price percentile(50);

The 50th percentile is equivalent to the median.

Window Functions

Window functions compute a rank or positional value for each record in the bucket relative to all other records, ordered by the specified field. Unlike aggregate functions, window functions return a value per row rather than a single summary value.

Syntax:

analyze <bucket> <field> window(<function>, <partition_size>);

The partition_size argument specifies the number of records to include in the window frame.

Supported window functions:

Function

Description

row_number

Assigns a unique sequential integer to each record, starting at 1, regardless of field value.

rank

Assigns a rank based on field value, with ties sharing the same rank and leaving gaps in the sequence (e.g., 1, 2, 2, 4).

dense_rank

Assigns a rank based on field value, with ties sharing the same rank but without gaps in the sequence (e.g., 1, 2, 2, 3).

percent_rank

Assigns a relative rank in the range [0, 1], computed as (rank - 1) / (total_rows - 1).

Examples:

analyze Sales revenue window(row_number,   15);
analyze Sales revenue window(rank,         15);
analyze Sales revenue window(dense_rank,   15);
analyze Sales revenue window(percent_rank, 15);