Re: planner chooses incremental but not the best one

From: Sébastien Lardière <sebastien(at)lardiere(dot)net>
To: Richard Guo <guofenglinux(at)gmail(dot)com>, Nicolas Lutic <n(dot)lutic(at)loxodata(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: planner chooses incremental but not the best one
Date: 2023-12-22 09:17:45
Message-ID: 0079d421-163f-4c7b-8d09-b3ff29261960@lardiere.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15/12/2023 09:58, Richard Guo wrote:
>
> On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofenglinux(at)gmail(dot)com>
> wrote:
>
> It seems that we need to improve estimate of distinct values in
> estimate_num_groups() when taking the selectivity of restrictions into
> account.
>
> In 84f9a35e3 we changed to a new formula to perform such estimation.
> But that does not apply to the case here, because for an appendrel,
> set_append_rel_size() always sets "raw tuples" count equal to "rows",
> and that would make estimate_num_groups() skip the adjustment of the
> estimate using the new formula.
>
>
> I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
> Why don't we set it to the accumulated estimate of tuples from each live
> child, like attached?  I believe this aligns more closely with reality.
>
> And this would also allow us to adjust the estimate for the number of
> distinct values in estimate_num_groups() for appendrels using the new
> formula introduced in 84f9a35e3.  As I experimented, this can improve
> the estimate for appendrels.  For instance,
>
> create table t (a int, b int, c float) partition by range(a);
> create table tp1 partition of t for values from (0) to (1000);
> create table tp2 partition of t for values from (1000) to (2000);
>
> insert into t select i%2000, (100000 * random())::int, random() from
> generate_series(1,1000000) i;
> analyze t;
>
> explain analyze select b from t where c < 0.1 group by b;
>
> -- on master
>  HashAggregate  (cost=18659.28..19598.74 rows=93946 width=4)
>                 (actual time=220.760..234.439 rows=63224 loops=1)
>
> -- on patched
>  HashAggregate  (cost=18659.28..19294.25 rows=63497 width=4)
>                 (actual time=235.161..250.023 rows=63224 loops=1)
>
> With the patch the estimate for the number of distinct 'b' values is
> more accurate.
>
> BTW, this patch does not change any existing regression test results.  I
> attempted to devise a regression test that shows how this change can
> improve query plans, but failed.  Should I try harder to find such a
> test case?

Hi,

thank you for the patch ; I've tried it and it works with the scenario
you provide.

As Nicolas's co-worker, I've been involved in this case, but,
unfortunately, we're not able to test the patch with the actual data for
the moment, but I'll ask a dump to the real owner.

About the regression test, I don't know how to implement it either.

best regards,

--
Sébastien

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2023-12-22 09:41:13 RE: Synchronizing slots from primary to standby
Previous Message Alvaro Herrera 2023-12-22 09:05:54 Re: Set all variable-length fields of pg_attribute to null on column drop