Re: POC: GROUP BY optimization

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: POC: GROUP BY optimization
Date: 2023-12-27 02:48:08
Message-ID: CAPpHfdsDE-TGV4Ra1NTAQJ0BHDCuJZLCWO697rhK_cjZ1QX5fA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 26, 2023 at 1:37 PM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 21/12/2023 17:53, Alexander Korotkov wrote:
> > On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
> > <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> >> New version of the patch. Fixed minor inconsistencies and rebased onto
> >> current master.
> > Thank you (and other authors) for working on this subject. Indeed to
> > GROUP BY clauses are order-agnostic. Reordering them in the most
> > suitable order could give up significant query planning benefits. I
> > went through the thread: I see significant work has been already made
> > on this patch, the code is quite polished.
> Maybe, but issues, mentioned in [1], still not resolved. It is the only
> reason, why this thread hasn't been active.

Yes, this makes sense. I have a couple of points from me on this subject.
1) The patch reorders GROUP BY items not only to make comparison
cheaper but also to match the ordering of input paths and to match the
ORDER BY clause. Thus, even if we leave aside for now sorting GROUP
BY items by their cost, the patch will remain valuable.
2) An accurate estimate of the sorting cost is quite a difficult task.
What if we make a simple rule of thumb that sorting integers and
floats is cheaper than sorting numerics and strings with collation C,
in turn, that is cheaper than sorting collation-aware strings
(probably more groups)? Within the group, we could keep the original
order of items.

------
Regards,
Alexander Korotkov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2023-12-27 03:07:48 Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message Alexander Korotkov 2023-12-27 02:00:27 Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'