Re: Add proper planner support for ORDER BY / DISTINCT aggregates

From: Pavel Luzanov <p(dot)luzanov(at)postgrespro(dot)ru>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date: 2022-11-07 16:58:50
Message-ID: 43b1c6fc-e2f4-c252-5ce5-3db98c6b71d4@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 07.11.2022 17:53, Ronan Dunklau wrote:
> In your exact use case, the combo incremental-sort + Index scan is evaluated
> to cost more than doing a full sort + seqscan.

> I think we can trace that back to incremental sort being pessimistic about
> it's performance. If you try the same query, but with set enable_seqscan = off,
> you will get a full sort over an index scan:
>
> QUERY PLAN
> -----------------------------------------------------------------------------------------
> GroupAggregate (cost=154944.94..162446.19 rows=100 width=34)
> Group Key: a
> -> Sort (cost=154944.94..157444.94 rows=1000000 width=4)
> Sort Key: a, c
> -> Index Scan using t_a_b_idx on t (cost=0.42..41612.60
> rows=1000000 width=4)
> (5 rows)

You are right. By disabling seq scan, we can get this plan. But compare
it with the plan in v15:

postgres(at)db(15.0)=# explain
select a, array_agg(c order by c) from t group by a;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 GroupAggregate  (cost=0.42..46667.56 rows=100 width=34)
   Group Key: a
   ->  Index Scan using t_a_b_idx on t  (cost=0.42..41666.31
rows=1000000 width=4)
(3 rows)

The total plan cost in v16 is ~4 times higher, while the index scan cost
remains the same.

> I can't see why an incrementalsort could be more expensive than a full sort,
> using the same presorted path.

The only reason I can see is the number of buffers to read. In the plan
with incremental sort we read the whole index, ~190000 buffers.
And the plan with seq scan only required ~5000 (I think due to buffer
ring optimization).

Perhaps this behavior is preferable. Especially when many concurrent
queries are running. The less buffer cache is busy, the better. But in
single-user mode this query is slower.

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-11-07 17:13:55 allow segment size to be set to < 1GiB
Previous Message Kirk Wolak 2022-11-07 16:57:37 Re: PL/pgSQL cursors should get generated portal names by default