Quick Links

Re: POC: GROUP BY optimization

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject:	Re: POC: GROUP BY optimization
Date:	2024-02-22 07:04:55
Message-ID:	d026580a-15a0-4734-9e69-cc6ebb70da1d@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 22/2/2024 13:35, Richard Guo wrote:
> The avg() function on integer argument is commonly used in
> aggregates.sql. I don't think this is an issue. See the first test
> query in aggregates.sql.
Make sense
> > it should be parallel to the test cases for utilize the ordering of
> > index scan and subquery scan.
>
> Also, I'm unsure about removing the disabling of the
> max_parallel_workers_per_gather parameter. Have you discovered the
> domination of the current plan over the partial one? Do the cost
> fluctuations across platforms not trigger a parallel plan?
>
>
> The table used for testing contains only 100 tuples, which is the size
> of only one page. I don't believe it would trigger any parallel plans,
> unless we manually change min_parallel_table_scan_size.
I don't intend to argue it, but just for the information, I frequently
reduce it to zero, allowing PostgreSQL to make a decision based on
costs. It sometimes works much better, because one small table in multi
join can disallow an effective parallel plan.
>
> What's more, I suggest to address here the complaint from [1]. As I
> see,
> cost difference between Sort and IncrementalSort strategies in that
> case
> is around 0.5. To make the test more stable I propose to change it a
> bit
> and add a limit:
> SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
> It makes efficacy of IncrementalSort more obvious difference around 10
> cost points.
>
>
> I don't think that's necessary. With Incremental Sort the final cost
> is:
>
> GroupAggregate (cost=1.66..19.00 rows=100 width=25)
>
> while with full Sort it is:
>
> GroupAggregate (cost=16.96..19.46 rows=100 width=25)
>
> With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
> is cheaper on total cost. Not to say that even if somehow we decide the
> two paths are fuzzily the same on total cost, the first path still
> dominates because its startup cost is much cheaper.
As before, I won't protest here - it needs some computations about how
much cost can be added by bulk extension of the relation blocks. If
Maxim will answer that it's enough to resolve his issue, why not?

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Re: POC: GROUP BY optimization at 2024-02-22 06:35:41 from Richard Guo

Responses

Re: POC: GROUP BY optimization at 2024-04-11 23:44:42 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Laurenz Albe	2024-02-22 07:16:09	Re: Speeding up COPY TO for uuids and arrays
Previous Message	Michael Paquier	2024-02-22 06:56:09	Re: Add lookup table for replication slot invalidation causes