Re: Introduce Index Aggregate - new GROUP BY strategy

From: Sergey Soloviev <sergey(dot)soloviev(at)tantorlabs(dot)ru>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Introduce Index Aggregate - new GROUP BY strategy
Date: 2026-02-04 17:33:44
Message-ID: 2b06b055-7f0d-42a7-ac0b-983ee92e239f@tantorlabs.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

> 2. Consider splitting the hash_* → spill_* field renaming into a separate preparatory commit
> to reduce the complexity of reviewing the core logic changes.

Here patches rebased to master. I've managed to move renaming part into different patch.
Also, I improved the accuracy of the planner in determining the required memory - it counts
total required index nodes and calculates amount of memory for them (internal and leaf
separately).

Patches in attachments.

> 3. I notice AGG_INDEX requires both sortable AND hashable types. While I understand this
> is for the hash-based spill partitioning, is this limitation necessary? Could you use sort-based
> spilling (similar to tuplesort's external merge) instead? This would allow AGG_INDEX to work
> with sortable-only types (I can imagine a geometric type with B-tree operators but no hash functions).

I thought about this idea and came to the conclusion, that this should be additional behaviour -
when the type is not hashable. Because if we allow only sortable types, then we have to choose
what to do to support memory limits:

1. Dump all tuples not present in index to disk
2. On overflow compute partial aggregates and at the end perform final merge/combine

Also, at the 1 case I am not considering sorting tuples, because otherwise what we get is plain
Sort/Group pair. By using hash-partitioning we improve performance, because all same tuples
will belong to the same bucket.

In case 2 we imply restriction on aggregation function itself, because not every aggregate has
combine function.

In the end, I haven't come to a decision on which option is better, so I will leave it as it is for now.

---
Sergey Soloviev

TantorLabs: https://tantorlabs.com

Attachment Content-Type Size
v4-0001-add-in-memory-btree-tuple-index.patch text/x-patch 23.6 KB
v4-0002-change-field-prefix-from-hash_-to-spill_-in-AggSt.patch text/x-patch 24.8 KB
v4-0003-introduce-AGG_INDEX-grouping-strategy-node.patch text/x-patch 76.3 KB
v4-0004-make-use-of-IndexAggregate-in-planner-and-explain.patch text/x-patch 27.3 KB
v4-0005-add-support-for-Partial-IndexAggregate.patch text/x-patch 14.3 KB
v4-0006-fix-tests-for-IndexAggregate.patch text/x-patch 84.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message David G. Johnston 2026-02-04 17:02:17 Re: Docs: Use non-default throughout the documentation