From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: POC: GROUP BY optimization |
Date: | 2023-12-21 10:53:27 |
Message-ID: | CAPpHfdtzaVa7S4onKy3YvttF2rrH5hQNHx9HtcSTLbpjx+MJ+w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
> New version of the patch. Fixed minor inconsistencies and rebased onto
> current master.
Thank you (and other authors) for working on this subject. Indeed to
GROUP BY clauses are order-agnostic. Reordering them in the most
suitable order could give up significant query planning benefits. I
went through the thread: I see significant work has been already made
on this patch, the code is quite polished.
I'd like to make some notes.
1) As already mentioned, there is clearly a repetitive pattern for the
code following after get_useful_group_keys_orderings() calls. I think
it would be good to extract it into a separate function. Please, do
this as a separate patch coming before the group-by patch. That would
simplify the review.
2) I wonder what planning overhead this patch could introduce? Could
you try to measure the worst case? What if we have a table with a lot
of indexes and a long list of group-by clauses partially patching
every index. This should give us an understanding on whether we need
a separate GUC to control this feature.
3) I see that get_useful_group_keys_orderings() makes 3 calls to
get_cheapest_group_keys_order() function. Each time
get_cheapest_group_keys_order() performs the cost estimate and
reorders the free keys. However, cost estimation implies the system
catalog lookups (that is quite expensive). I wonder if we could
change the algorithm. Could we just sort the group-by keys by cost
once, save this ordering and then just re-use it. So, every time we
need to reorder a group by, we can just pull the required keys to the
top and use saved ordering for the rest. I also wonder if we could do
this once for add_paths_to_grouping_rel() and
create_partial_grouping_paths() calls. So, it probably should be
somewhere in create_ordinary_grouping_paths().
4) I think we can do some optimizations when enable_incremental_sort
== off. Then in get_useful_group_keys_orderings() we should only deal
with input_path fully matching the group-by clause, and try only full
match of group-by output to the required order.
------
Regards,
Alexander Korotkov
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Berg | 2023-12-21 10:56:22 | int4->bool test coverage |
Previous Message | Pavel Stehule | 2023-12-21 10:46:29 | Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation |