Re: POC: GROUP BY optimization

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru>
Subject: Re: POC: GROUP BY optimization
Date: 2023-12-27 04:15:22
Message-ID: CAPpHfdvMKbV=5FyVv3zrpvqWUwWKATd1XpGsoDUaqTV2HtGGiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 27, 2023 at 5:23 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> > 2) An accurate estimate of the sorting cost is quite a difficult task.
>
> Indeed.
>
> > What if we make a simple rule of thumb that sorting integers and
> > floats is cheaper than sorting numerics and strings with collation C,
> > in turn, that is cheaper than sorting collation-aware strings
> > (probably more groups)? Within the group, we could keep the original
> > order of items.
>
> I think it's a fool's errand to even try to separate different sort
> column orderings by cost. We simply do not have sufficiently accurate
> cost information. The previous patch in this thread got reverted because
> of that (well, also some implementation issues, but mostly that), and
> nothing has happened to make me think that another try will fare any
> better.

If there is a choice of what to compare first: 8-bytes integers or
collation-aware texts possibly toasted, then the answer is pretty
evident for me. For sure, there are cases then this choice is wrong.
But even if all the integers appear to be the same, the penalty isn't
that much.

Besides sorting column orderings by cost, this patch also tries to
match GROUP BY pathkeys to input pathkeys and ORDER BY pathkeys. Do
you think there is a chance for the second part if we leave the cost
part aside?

------
Regards,
Alexander Korotkov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-12-27 04:27:39 Re: POC: GROUP BY optimization
Previous Message Tom Lane 2023-12-27 03:23:24 Re: POC: GROUP BY optimization