Re: planner chooses incremental but not the best one

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Nicolas Lutic <n(dot)lutic(at)loxodata(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: planner chooses incremental but not the best one
Date: 2023-12-15 08:58:22
Message-ID: CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> It seems that we need to improve estimate of distinct values in
> estimate_num_groups() when taking the selectivity of restrictions into
> account.
>
> In 84f9a35e3 we changed to a new formula to perform such estimation.
> But that does not apply to the case here, because for an appendrel,
> set_append_rel_size() always sets "raw tuples" count equal to "rows",
> and that would make estimate_num_groups() skip the adjustment of the
> estimate using the new formula.
>

I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached? I believe this aligns more closely with reality.

And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3. As I experimented, this can improve
the estimate for appendrels. For instance,

create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);

insert into t select i%2000, (100000 * random())::int, random() from
generate_series(1,1000000) i;
analyze t;

explain analyze select b from t where c < 0.1 group by b;

-- on master
HashAggregate (cost=18659.28..19598.74 rows=93946 width=4)
(actual time=220.760..234.439 rows=63224 loops=1)

-- on patched
HashAggregate (cost=18659.28..19294.25 rows=63497 width=4)
(actual time=235.161..250.023 rows=63224 loops=1)

With the patch the estimate for the number of distinct 'b' values is
more accurate.

BTW, this patch does not change any existing regression test results. I
attempted to devise a regression test that shows how this change can
improve query plans, but failed. Should I try harder to find such a
test case?

Thanks
Richard

Attachment Content-Type Size
v1-0001-Adjust-tuples-estimate-for-appendrel.patch application/octet-stream 1.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2023-12-15 09:13:33 Re: Add 64-bit XIDs into PostgreSQL 15
Previous Message jian he 2023-12-15 08:36:41 Re: remaining sql/json patches