Quick Links

Re: POC: GROUP BY optimization

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru>
Subject:	Re: POC: GROUP BY optimization
Date:	2023-12-27 04:35:41
Message-ID:	ed20bce1-333f-4ebc-96fc-dac5f931e4b2@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 27/12/2023 11:15, Alexander Korotkov wrote:
> On Wed, Dec 27, 2023 at 5:23 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
>>> 2) An accurate estimate of the sorting cost is quite a difficult task.
>>
>> Indeed.
>>
>>> What if we make a simple rule of thumb that sorting integers and
>>> floats is cheaper than sorting numerics and strings with collation C,
>>> in turn, that is cheaper than sorting collation-aware strings
>>> (probably more groups)? Within the group, we could keep the original
>>> order of items.
>>
>> I think it's a fool's errand to even try to separate different sort
>> column orderings by cost. We simply do not have sufficiently accurate
>> cost information. The previous patch in this thread got reverted because
>> of that (well, also some implementation issues, but mostly that), and
>> nothing has happened to make me think that another try will fare any
>> better.
To be clear. In [1], I mentioned we can perform micro-benchmarks and
structure costs of operators. At least for fixed-length operators, it is
relatively easy. So, the main block here is an accurate prediction of
ndistincts for different combinations of columns. Does it make sense to
continue to design the feature in the direction of turning on choosing
between different sort column orderings if we have extended statistics
on the columns?

[1]
https://www.postgresql.org/message-id/e3602ccb-e643-2e79-ed2c-1175a80533a1@postgrespro.ru

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Re: POC: GROUP BY optimization at 2023-12-27 04:15:22 from Alexander Korotkov

Responses

Re: POC: GROUP BY optimization at 2023-12-27 05:07:23 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Rian McGuire	2023-12-27 04:44:09	[PATCH] pg_dump: Do not dump statistics for excluded tables
Previous Message	Tom Lane	2023-12-27 04:27:39	Re: POC: GROUP BY optimization