| From: | Sergey Soloviev <sergey(dot)soloviev(at)tantorlabs(dot)ru> |
|---|---|
| To: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Introduce Index Aggregate - new GROUP BY strategy |
| Date: | 2026-02-04 17:33:44 |
| Message-ID: | 2b06b055-7f0d-42a7-ac0b-983ee92e239f@tantorlabs.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi!
> 2. Consider splitting the hash_* → spill_* field renaming into a separate preparatory commit
> to reduce the complexity of reviewing the core logic changes.
Here patches rebased to master. I've managed to move renaming part into different patch.
Also, I improved the accuracy of the planner in determining the required memory - it counts
total required index nodes and calculates amount of memory for them (internal and leaf
separately).
Patches in attachments.
> 3. I notice AGG_INDEX requires both sortable AND hashable types. While I understand this
> is for the hash-based spill partitioning, is this limitation necessary? Could you use sort-based
> spilling (similar to tuplesort's external merge) instead? This would allow AGG_INDEX to work
> with sortable-only types (I can imagine a geometric type with B-tree operators but no hash functions).
I thought about this idea and came to the conclusion, that this should be additional behaviour -
when the type is not hashable. Because if we allow only sortable types, then we have to choose
what to do to support memory limits:
1. Dump all tuples not present in index to disk
2. On overflow compute partial aggregates and at the end perform final merge/combine
Also, at the 1 case I am not considering sorting tuples, because otherwise what we get is plain
Sort/Group pair. By using hash-partitioning we improve performance, because all same tuples
will belong to the same bucket.
In case 2 we imply restriction on aggregation function itself, because not every aggregate has
combine function.
In the end, I haven't come to a decision on which option is better, so I will leave it as it is for now.
---
Sergey Soloviev
TantorLabs: https://tantorlabs.com
| Attachment | Content-Type | Size |
|---|---|---|
| v4-0001-add-in-memory-btree-tuple-index.patch | text/x-patch | 23.6 KB |
| v4-0002-change-field-prefix-from-hash_-to-spill_-in-AggSt.patch | text/x-patch | 24.8 KB |
| v4-0003-introduce-AGG_INDEX-grouping-strategy-node.patch | text/x-patch | 76.3 KB |
| v4-0004-make-use-of-IndexAggregate-in-planner-and-explain.patch | text/x-patch | 27.3 KB |
| v4-0005-add-support-for-Partial-IndexAggregate.patch | text/x-patch | 14.3 KB |
| v4-0006-fix-tests-for-IndexAggregate.patch | text/x-patch | 84.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | David G. Johnston | 2026-02-04 17:02:17 | Re: Docs: Use non-default throughout the documentation |