Aggregations

Run Elasticsearch metric and bucket aggregations with Sigmie — sum, avg, stats, terms, histogram, date histogram, and pipeline aggregations.

On this page

Aggregations summarize and analyze your indexed data. Use them to power analytics dashboards, statistical summaries, and the underlying data for filter UIs.

Sigmie has two paths into aggregations:

  1. Facets — high-level, integrated with properties. The right choice for filter sidebars.
  2. Raw aggregations — direct access to all Elasticsearch aggregation types. The right choice for analytics.

This page covers the raw aggregations API.

Basic usage

use Sigmie\Query\Aggs;
 
$response = $sigmie->newQuery('orders')
->matchAll()
->aggregate(function (Aggs $agg) {
$agg->sum(name: 'turnover', field: 'price');
})
->get();
 
$response->aggregation('turnover.value'); // 54.403

Metric aggregations

Metrics return a single value across the matched documents.

Sum

$agg->sum(name: 'stock_sum', field: 'stock');
$response->aggregation('stock_sum.value');

SQL equivalent: SELECT SUM(stock).

Max / Min / Avg

$agg->max(name: 'max_price', field: 'price');
$agg->min(name: 'min_price', field: 'price');
$agg->avg(name: 'avg_rating', field: 'rating');

Access with $response->aggregation('max_price.value').

Value count

Count of distinct values:

$agg->valueCount(name: 'categories_count', field: 'category');

Cardinality

Approximate distinct-value count — much cheaper than valueCount on large fields:

$agg->cardinality(name: 'unique_users', field: 'user_id');

Stats

A quick statistical summary:

$agg->stats(name: 'sales_stats', field: 'amount');
$response->aggregation('sales_stats');
// [
// 'count' => 133,
// 'min' => 5.33,
// 'max' => 128.58,
// 'avg' => 73.53,
// 'sum' => 9779.49,
// ]

Bucket aggregations

Bucket aggregations group documents by criteria — each bucket holds the documents that match.

Terms

Group by the unique values of a field. Use a keyword field (or text field with a .keyword sub-field):

$agg->terms(name: 'category_terms', field: 'category')->missing('N/A');
 
$response->aggregation('category_terms.buckets');
// [
// ['key' => 'Musical', 'doc_count' => 18],
// ['key' => 'Adventure', 'doc_count' => 13],
// ['key' => 'Fantasy', 'doc_count' => 20],
// ['key' => 'N/A', 'doc_count' => 7],
// ]

missing('N/A') puts documents without the field into a bucket of that key.

Range

Group by explicit numeric ranges:

$agg->range(name: 'price_ranges', field: 'price', [
['key' => '0-100', 'to' => 100],
['key' => '100-200', 'from' => 100, 'to' => 200],
['key' => '200+', 'from' => 200],
]);
 
$response->aggregation('price_ranges.buckets');
// [
// '0-100' => ['to' => 100, 'doc_count' => 803],
// '100-200' => ['from' => 100, 'to' => 200, 'doc_count' => 422],
// '200+' => ['from' => 200, 'doc_count' => 343],
// ]

Histogram

Fixed-width buckets across a numeric field:

$agg->histogram(name: 'price_histogram', field: 'price', interval: 50);

Date histogram

Time-bucket documents:

$agg->dateHistogram(name: 'sales_over_time', field: 'created_at', interval: 'month');

Auto date histogram

Let Elasticsearch pick the bucket interval:

$agg->autoDateHistogram(name: 'timeline', field: 'created_at', buckets: 12);

Sub-aggregations

Nest aggregations to compute metrics per bucket:

$agg->terms(name: 'category_terms', field: 'category')
->subAggregation(function (Aggs $sub) {
$sub->avg(name: 'avg_price', field: 'price');
$sub->max(name: 'max_price', field: 'price');
});

Each category bucket now carries avg_price and max_price alongside doc_count.

Pipeline aggregations

Operate on the output of other aggregations:

$agg->terms(name: 'monthly_sales', field: 'month')
->subAggregation(function (Aggs $sub) {
$sub->sum(name: 'total_sales', field: 'amount');
})
->pipelineAggregation(function (Aggs $pipe) {
$pipe->avgBucket(name: 'avg_monthly_sales', bucketsPath: 'monthly_sales>total_sales');
});

Filtered aggregations

Run an aggregation over a filtered subset of the query results:

$agg->filter(name: 'expensive_products', filter: ['range' => ['price' => ['gte' => 100]]])
->subAggregation(function (Aggs $sub) {
$sub->terms(name: 'expensive_categories', field: 'category');
});

Combined with the query builder

$response = $sigmie->newQuery('products')
->properties($props)
->matchAll()
->facets('category price:50')
->scriptScore(
source: "Math.log(2 + doc['popularity'].value)",
boostMode: 'replace',
)
->get();
 
$hits = $response->json('hits.hits');
$facets = $response->json('facets');
$rawAggs = $response->json('aggregations');

Analytics-only requests

For pure analytics (no documents needed), set size(0):

$response = $sigmie->newQuery('sales')
->matchAll()
->aggregate(function (Aggs $agg) {
$agg->dateHistogram('sales_over_time', 'date', 'month')
->subAggregation(function (Aggs $sub) {
$sub->sum('monthly_revenue', 'amount');
});
 
$agg->terms('top_products', 'product_id')
->size(10)
->subAggregation(function (Aggs $sub) {
$sub->sum('product_revenue', 'amount');
});
})
->size(0)
->get();

Performance

  • Use keyword fields for term aggregations — text fields require .keyword sub-fields.
  • Limit bucket size — terms(...)->size(10) for top 10.
  • Aggregate inside a filter() boolean clause to enable Elasticsearch’s filter cache.
  • Cardinality aggregations on high-cardinality fields use significant memory.
$sigmie->newQuery('products')
->properties($props)
->bool(function ($bool) {
$bool->filter()->term('status', 'active'); // cached
$bool->must()->match('title', $searchTerm);
})
->facets('category:10 brand:10') // top 10 per facet
->size(20)
->get();

See also