Re: POC: GROUP BY optimization

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: POC: GROUP BY optimization
Date: 2024-01-09 11:07:19
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/1/2024 16:45, vignesh C wrote:
> On Tue, 9 Jan 2024 at 14:31, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> Here is a new version of GROUP-BY optimization without sort model.
>> On 21/12/2023 17:53, Alexander Korotkov wrote:
>>> I'd like to make some notes.
>>> 1) As already mentioned, there is clearly a repetitive pattern for the
>>> code following after get_useful_group_keys_orderings() calls. I think
>>> it would be good to extract it into a separate function. Please, do
>>> this as a separate patch coming before the group-by patch. That would
>>> simplify the review.
>> Done. See patch 0001-*. Unfortunately, extraction of whole cycle isn't
>> practical, because it blows out the interface of the routine.
>>> 2) I wonder what planning overhead this patch could introduce? Could
>>> you try to measure the worst case? What if we have a table with a lot
>>> of indexes and a long list of group-by clauses partially patching
>>> every index. This should give us an understanding on whether we need
>>> a separate GUC to control this feature.
>> In current implementation I don't anticipate any significant overhead.
>> GUC is needed here to allow users adhere their own ordering and to
>> disable feature in the case of problems.
>>> 4) I think we can do some optimizations when enable_incremental_sort
>>> == off. Then in get_useful_group_keys_orderings() we should only deal
>>> with input_path fully matching the group-by clause, and try only full
>>> match of group-by output to the required order.
>> Hm, is it really make sense in current implementation?
> CFBot shows the following errors at [1] with:
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c: In function
> ‘estimate_num_groups’:
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: warning:
> implicit declaration of function ‘estimate_num_groups_incremental’
> [-Wimplicit-function-declaration]
> [08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,
> [08:33:28.813] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c: At top level:
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: warning: no
> previous prototype for ‘estimate_num_groups_incremental’
> [-Wmissing-prototypes]
> [08:33:28.813] 3400 | estimate_num_groups_incremental(PlannerInfo
> *root, List *groupExprs,
> [08:33:28.813] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: error:
> conflicting types for ‘estimate_num_groups_incremental’
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: note:
> previous implicit declaration of ‘estimate_num_groups_incremental’ was
> here
> [08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,
Hmm, I don't see this old code in these patches. Resend 0002-* because
of trailing spaces.

Andrei Lepikhov
Postgres Professional

Attachment Content-Type Size
0002-Explore-alternative-orderings-of-group-by-pathkeys-d-20240109.patch text/plain 31.5 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2024-01-09 11:08:45 Re: INFORMATION_SCHEMA note
Previous Message Alexander Lakhin 2024-01-09 11:00:01 Re: Random pg_upgrade test failure on drongo