Re: POC: GROUP BY optimization

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject: Re: POC: GROUP BY optimization
Date: 2024-04-18 05:01:13
Message-ID: 50c1c1dc-057c-421b-a1f4-62622c7afe61@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/12/24 06:44, Tom Lane wrote:
> * It seems like root->processed_groupClause no longer has much to do
> with the grouping behavior, which is scary given how much code still
> believes that it does. I suspect there are bugs lurking there, and
1. Analysing the code, processed_groupClause is used in:
- adjust_foreign_grouping_path_cost - to decide, do we need to add cost
of sort to the foreign grouping.
- just for replacing relids during SJE process.
- create_groupingsets_plan(), preprocess_grouping_sets,
remove_useless_groupby_columns - we don't apply this feature in the case
of grouping sets.
- get_number_of_groups - estimation shouldn't depend on the order of
clauses.
- create_grouping_paths - to analyse hashability of grouping clause
- make_group_input_target, make_partial_grouping_target - doesn't depend
on the order of clauses
planner.c: add_paths_to_grouping_rel, create_partial_grouping_paths - in
the case of hash grouping.

So, the only case we can impact current behavior lies in the
postgres_fdw. But here we don't really know which plan will be chosen
during planning on foreign node and stay the same behavior is legal for me.
> am not comforted by the fact that the patch changed exactly nothing
> in the pathnodes.h documentation of that field. This comment looks
> pretty silly now too:
>
> /* Preprocess regular GROUP BY clause, if any */
> root->processed_groupClause = list_copy(parse->groupClause);
>
> What "preprocessing" is going on there? This comment was adequate
> before, when the code line invoked preprocess_groupclause which had
> a bunch of commentary; but now it has to stand on its own and it's
> not doing a great job of that.
Thanks for picking this place. I fixed it.
More interesting here is that we move decisions on the order of group-by
clauses to the stage, where we already have underlying subtree and
incoming path keys. In principle, we could return the preprocessing
routine and arrange GROUP-BY with the ORDER-BY clause as it was before.
But looking into the code, I found only one reason to do this:
adjust_group_pathkeys_for_groupagg. Curious about how much profit we get
here, I think we can discover it later with no hurry. A good outcome
here will be a test case that can show the importance of arranging
GROUP-BY and ORDER-BY at an early stage.

--
regards,
Andrei Lepikhov
Postgres Professional

Attachment Content-Type Size
minor_comment.patch text/x-patch 656 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Imseih (AWS), Sami 2024-04-18 05:05:03 Re: allow changing autovacuum_max_workers without restarting
Previous Message Nathan Bossart 2024-04-18 04:17:12 improve performance of pg_dump --binary-upgrade