| From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
|---|---|
| To: | Sergey Soloviev <sergey(dot)soloviev(at)tantorlabs(dot)ru> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Introduce Index Aggregate - new GROUP BY strategy |
| Date: | 2025-12-08 23:11:46 |
| Message-ID: | CAApHDvpxbk8DqSmpsCBZrmCin69rpupKEKHFiM3SqFDdaHeAjg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, 9 Dec 2025 at 04:37, Sergey Soloviev
<sergey(dot)soloviev(at)tantorlabs(dot)ru> wrote:
> I would like to introduce new GROUP BY strategy, called Index Aggregate.
> In a nutshell, we build B+tree index where GROUP BY attributes are index
> keys and if memory limit reached we will build index for each batch and
> spill it to the disk as sorted run performing final external merge.
> Mean IndexAgg time is about 1.8 ms and 2 ms for hash + sort, so win is about 10%.
>
> Also, I have run TPC-H tests and 2 tests used Index Agg node (4 and 5) and this gave
> near 5% gain in time.
Interesting.
Are you able to provide benchmarks with increasing numbers of groups,
say 100 to 100 million, increasing in multiples of 10, with say 1GB
work_mem, and to be fair, hash_mem_multiplier=1 with all 3 strategies.
A binary search's performance characteristics will differ vastly from
that of simplehash's hash lookup and linear probe type search. Binary
searches become much less optimal when the array becomes large as
there are many more opportunities for cache misses than with a linear
probing hash table. I think you're going to have to demonstrate that
the window where this is useful is big enough to warrant the extra
code.
Ideally, if you could show a graph and maybe name Hash Aggregate as
the baseline and show that as 1 always, then run the same benchmark
forcing a Sort -> Group Agg, and then also your Index Agg. Also,
ideally, if you could provide scripts for this so people can easily
run it themselves, to allow us to see how other hardware compares to
yours. Doing this may also help you move forward with your costing
code for the planner, but the main thing to show is that there is a
useful enough data size where this is useful.
You might want to repeat the test a few times with different data
types. Perhaps int or bigint, then also something varlena and maybe
something byref, such as UUID. Also, you might want to avoid presorted
data as I suspect it'll be hard to beat Sort -> Group Agg with
presorted data. Not causing performance regressions for presorted data
might be quite a tricky aspect of this patch.
David
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2025-12-08 23:32:20 | Re: Let's add a test for NLS translation of PRI* macros |
| Previous Message | Chao Li | 2025-12-08 23:04:24 | Re: vacuumdb: add --dry-run |