Re: Add proper planner support for ORDER BY / DISTINCT aggregates

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pavel Luzanov <p(dot)luzanov(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date: 2022-11-07 14:53:42
Message-ID: 2202180.iZASKD2KPV@aivenlaptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le samedi 5 novembre 2022, 09:51:23 CET Pavel Luzanov a écrit :
> While playing with the patch I found a situation where the performance
> may be degraded compared to previous versions.
>
> The test case below.
> If you create a proper index for the query (a,c), version 16 wins. On my
> notebook, the query runs ~50% faster.
> But if there is no index (a,c), but only (a,b), in previous versions the
> planner uses it, but with this patch a full table scan is selected.

Hello,

In your exact use case, the combo incremental-sort + Index scan is evaluated
to cost more than doing a full sort + seqscan.

If you try for example to create an index on (b, a) and group by b, you will
get the expected behaviour:

ro=# create index on t (b, a);
CREATE INDEX
ro=# explain select b, array_agg(c order by c) from t group by b;
QUERY PLAN
-----------------------------------------------------------------------------------------
GroupAggregate (cost=10.64..120926.80 rows=9970 width=36)
Group Key: b
-> Incremental Sort (cost=10.64..115802.17 rows=1000000 width=6)
Sort Key: b, c
Presorted Key: b
-> Index Scan using t_b_a_idx on t (cost=0.42..47604.12
rows=1000000 width=6)
(6 rows)

I think we can trace that back to incremental sort being pessimistic about
it's performance. If you try the same query, but with set enable_seqscan = off,
you will get a full sort over an index scan:

QUERY PLAN
-----------------------------------------------------------------------------------------
GroupAggregate (cost=154944.94..162446.19 rows=100 width=34)
Group Key: a
-> Sort (cost=154944.94..157444.94 rows=1000000 width=4)
Sort Key: a, c
-> Index Scan using t_a_b_idx on t (cost=0.42..41612.60
rows=1000000 width=4)
(5 rows)

This probably comes from the overly pessimistic behaviour that the number of
tuples per group will be 1.5 times as much as we should estimate:

/*
* Estimate average cost of sorting of one group where presorted
keys are
* equal. Incremental sort is sensitive to distribution of tuples
to the
* groups, where we're relying on quite rough assumptions. Thus,
we're
* pessimistic about incremental sort performance and increase its
average
* group size by half.
*/

I can't see why an incrementalsort could be more expensive than a full sort,
using the same presorted path. It looks to me that in that case we should
always prefer an incrementalsort. Maybe we should bound incremental sorts cost
to make sure they are never more expensive than the full sort ?

Also, prior to this commit I don't think it made a real difference, because
worst case scenario we would have missed an incremental sort, which we didn't
have beforehand. But with this patch, we may actually replace a "hidden"
incremental sort which was done in the agg codepath by a full sort.

Best regards,

--
Ronan Dunklau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-11-07 15:08:17 Re: WIN32 pg_import_system_collations
Previous Message Karthik Jagadish (kjagadis) 2022-11-07 14:46:18 Re: Postgres auto vacuum - Disable