Re: Introduce Index Aggregate - new GROUP BY strategy

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Sergey Soloviev <sergey(dot)soloviev(at)tantorlabs(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Introduce Index Aggregate - new GROUP BY strategy
Date: 2025-12-26 15:20:38
Message-ID: 2c5f9bcf-ac86-40bd-b695-6db21a5c9fc1@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/12/25 17:23, Sergey Soloviev wrote:
> Also, cost calculation logic is adjusted a bit - it takes into account
> top-down index
> traversal and final external merge cost is added only if spill expected.

Hi,
Here is my 'aerial' review:
The patch proposes a new aggregation strategy that builds an in-memory
B+tree index for grouping. This combines incremental group formation
(like AGG_HASHED) with sorted output (like AGG_SORTED), which is
beneficial when the query requires both grouping and ordering on
(almost) the same columns.
The key advantage is avoiding a separate sort step when the sorted
output is needed, at the cost of additional CPU overhead.

My doubts:
1. Can you benchmark the scenario where the optimiser mispredicts
numGroups. If the planner underestimates group cardinality, the btree
overhead could be much higher than expected. Does the approach degrade
gracefully?
2. Consider splitting the hash_* → spill_* field renaming into a
separate preparatory commit to reduce the complexity of reviewing the
core logic changes.
3. I notice AGG_INDEX requires both sortable AND hashable types. While I
understand this is for the hash-based spill partitioning, is this
limitation necessary? Could you use sort-based spilling (similar to
tuplesort's external merge) instead? This would allow AGG_INDEX to work
with sortable-only types (I can imagine a geometric type with B-tree
operators but no hash functions).

The main question for me is: can you invent a robust cost model to set
smooth boundaries between all three types of grouping? Does it really
promise frequent benefits and avoid degradations? - Remember,
increasing search space we increase planning time, which may be palpable
in cases with many groupings/grouping attributes - for example, an
APPEND over a partitioned table with pushed-down aggregate looks like a
trivial case.

--
regards, Andrei Lepikhov,
pgEdge

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message zengman 2025-12-26 15:23:53 Re: 17f446784d54da827f74c2acc0fa772a41b92354 breaks orafce build
Previous Message Peter Eisentraut 2025-12-26 15:20:10 Re: 17f446784d54da827f74c2acc0fa772a41b92354 breaks orafce build