Re: planner chooses incremental but not the best one

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Nicolas Lutic <n(dot)lutic(at)loxodata(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: planner chooses incremental but not the best one
Date: 2023-12-14 10:02:08
Message-ID: CAMbWs4-FtsJ0dQUv6g==XR_gsq=Fj9oiydW6gbqwQ_wrbU0osw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 12, 2023 at 4:40 PM Nicolas Lutic <n(dot)lutic(at)loxodata(dot)com> wrote:

> I've come across a behaviour of the planner I can't explain.
> After a migration from 11 to 15 (on RDS) we noticed a degradation in
> response time on a query, it went from a few seconds to ten minutes.
> A vacuum(analyze) has been realized to be sure that all is clean.
> The 'explain analyze' shows us a change of plan. Postgresql 15 chooses
> `incremental sort` with an index corresponding to the ORDER BY clause
> (on the created_at column). The previous v11 plan used a more efficient
> index.
>
> By deactivating incremental sort, response times in v15 are equal to v11
> one.

I think this issue is caused by under-estimating the startup cost of
incremental sort, which in turn is caused by over-estimating the number
of groups with equal presorted keys by estimate_num_groups().

We can simulate the same issue with the query below.

create table t (a int, b int, c int, d int) partition by range (a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);

insert into t select i%2000, i%1000, i%300, i from
generate_series(1,1000000)i;

create index on t(b);
create index on t(c);

analyze t;

-- by default incremental sort is chosen
explain analyze select * from t where b = 3 order by c, d limit 10;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=143.03..570.89 rows=10 width=16) (actual
time=375.399..375.402 rows=10 loops=1)
-> Incremental Sort (cost=143.03..42671.85 rows=994 width=16) (actual
time=375.397..375.399 rows=10 loops=1)
Sort Key: t.c, t.d
Presorted Key: t.c
Full-sort Groups: 1 Sort Method: top-N heapsort Average Memory:
25kB Peak Memory: 25kB
Pre-sorted Groups: 1 Sort Method: top-N heapsort Average Memory:
25kB Peak Memory: 25kB
-> Merge Append (cost=0.85..42644.84 rows=994 width=16) (actual
time=11.415..375.289 rows=335 loops=1)
Sort Key: t.c
-> Index Scan using tp1_c_idx on tp1 t_1
(cost=0.42..21318.39 rows=498 width=16) (actual time=6.666..189.356
rows=168 loops=1)
Filter: (b = 3)
Rows Removed by Filter: 171537
-> Index Scan using tp2_c_idx on tp2 t_2
(cost=0.42..21316.50 rows=496 width=16) (actual time=4.745..185.870
rows=168 loops=1)
Filter: (b = 3)
Rows Removed by Filter: 171534
Planning Time: 0.501 ms
Execution Time: 375.477 ms
(16 rows)

-- disable incremental sort
set enable_incremental_sort to off;
SET

explain analyze select * from t where b = 3 order by c, d limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2577.51..2577.53 rows=10 width=16) (actual time=2.928..2.930
rows=10 loops=1)
-> Sort (cost=2577.51..2579.99 rows=994 width=16) (actual
time=2.925..2.927 rows=10 loops=1)
Sort Key: t.c, t.d
Sort Method: top-N heapsort Memory: 25kB
-> Append (cost=8.28..2556.03 rows=994 width=16) (actual
time=0.627..2.670 rows=1000 loops=1)
-> Bitmap Heap Scan on tp1 t_1 (cost=8.28..1276.62
rows=498 width=16) (actual time=0.625..1.688 rows=500 loops=1)
Recheck Cond: (b = 3)
Heap Blocks: exact=500
-> Bitmap Index Scan on tp1_b_idx (cost=0.00..8.16
rows=498 width=0) (actual time=0.330..0.330 rows=500 loops=1)
Index Cond: (b = 3)
-> Bitmap Heap Scan on tp2 t_2 (cost=8.27..1274.43
rows=496 width=16) (actual time=0.178..0.876 rows=500 loops=1)
Recheck Cond: (b = 3)
Heap Blocks: exact=500
-> Bitmap Index Scan on tp2_b_idx (cost=0.00..8.14
rows=496 width=0) (actual time=0.093..0.093 rows=500 loops=1)
Index Cond: (b = 3)
Planning Time: 0.481 ms
Execution Time: 3.031 ms
(17 rows)

As we can see the execution time is 375.477 ms by default and 3.031 ms
if we disable incremental sort.

When we calculate the cost of incremental sort, we need to estimate the
number of groups into which the relation is divided by the presorted
keys, and then calculate the cost of sorting a single group. If we
over-estimate the number of groups, the startup cost of incremental sort
would be under-estimated. In the first plan above, the number of groups
with equal 't.c' is estimated to be 300 by estimate_num_groups(), but is
actually 3 after applying the restriction 'b = 3'. As a result, the
startup cost of the incremental sort is estimated to be 143.03, but is
actually 14222.68. That's why the planner mistakenly thinks the
increment sort path is the cheaper one.

It seems that we need to improve estimate of distinct values in
estimate_num_groups() when taking the selectivity of restrictions into
account.

In 84f9a35e3 we changed to a new formula to perform such estimation.
But that does not apply to the case here, because for an appendrel,
set_append_rel_size() always sets "raw tuples" count equal to "rows",
and that would make estimate_num_groups() skip the adjustment of the
estimate using the new formula.

Any thoughts on how to improve this?

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Emre Hasegeli 2023-12-14 10:14:33 Re: "pgoutput" options missing on documentation
Previous Message Sutou Kouhei 2023-12-14 09:44:14 Re: Make COPY format extendable: Extract COPY TO format implementations