Re: POC: GROUP BY optimization

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject: Re: POC: GROUP BY optimization
Date: 2024-02-22 07:04:55
Message-ID: d026580a-15a0-4734-9e69-cc6ebb70da1d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22/2/2024 13:35, Richard Guo wrote:
> The avg() function on integer argument is commonly used in
> aggregates.sql.  I don't think this is an issue.  See the first test
> query in aggregates.sql.
Make sense
> > it should be parallel to the test cases for utilize the ordering of
> > index scan and subquery scan.
>
> Also, I'm unsure about removing the disabling of the
> max_parallel_workers_per_gather parameter. Have you discovered the
> domination of the current plan over the partial one? Do the cost
> fluctuations across platforms not trigger a parallel plan?
>
>
> The table used for testing contains only 100 tuples, which is the size
> of only one page.  I don't believe it would trigger any parallel plans,
> unless we manually change min_parallel_table_scan_size.
I don't intend to argue it, but just for the information, I frequently
reduce it to zero, allowing PostgreSQL to make a decision based on
costs. It sometimes works much better, because one small table in multi
join can disallow an effective parallel plan.
>
> What's more, I suggest to address here the complaint from [1]. As I
> see,
> cost difference between Sort and IncrementalSort strategies in that
> case
> is around 0.5. To make the test more stable I propose to change it a
> bit
> and add a limit:
> SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
> It makes efficacy of IncrementalSort more obvious difference around 10
> cost points.
>
>
> I don't think that's necessary.  With Incremental Sort the final cost
> is:
>
>     GroupAggregate  (cost=1.66..19.00 rows=100 width=25)
>
> while with full Sort it is:
>
>     GroupAggregate  (cost=16.96..19.46 rows=100 width=25)
>
> With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
> is cheaper on total cost.  Not to say that even if somehow we decide the
> two paths are fuzzily the same on total cost, the first path still
> dominates because its startup cost is much cheaper.
As before, I won't protest here - it needs some computations about how
much cost can be added by bulk extension of the relation blocks. If
Maxim will answer that it's enough to resolve his issue, why not?

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2024-02-22 07:16:09 Re: Speeding up COPY TO for uuids and arrays
Previous Message Michael Paquier 2024-02-22 06:56:09 Re: Add lookup table for replication slot invalidation causes