From: | Sébastien Lardière <sebastien(at)lardiere(dot)net> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com>, Nicolas Lutic <n(dot)lutic(at)loxodata(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: planner chooses incremental but not the best one |
Date: | 2023-12-22 09:17:45 |
Message-ID: | 0079d421-163f-4c7b-8d09-b3ff29261960@lardiere.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 15/12/2023 09:58, Richard Guo wrote:
>
> On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofenglinux(at)gmail(dot)com>
> wrote:
>
> It seems that we need to improve estimate of distinct values in
> estimate_num_groups() when taking the selectivity of restrictions into
> account.
>
> In 84f9a35e3 we changed to a new formula to perform such estimation.
> But that does not apply to the case here, because for an appendrel,
> set_append_rel_size() always sets "raw tuples" count equal to "rows",
> and that would make estimate_num_groups() skip the adjustment of the
> estimate using the new formula.
>
>
> I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
> Why don't we set it to the accumulated estimate of tuples from each live
> child, like attached? I believe this aligns more closely with reality.
>
> And this would also allow us to adjust the estimate for the number of
> distinct values in estimate_num_groups() for appendrels using the new
> formula introduced in 84f9a35e3. As I experimented, this can improve
> the estimate for appendrels. For instance,
>
> create table t (a int, b int, c float) partition by range(a);
> create table tp1 partition of t for values from (0) to (1000);
> create table tp2 partition of t for values from (1000) to (2000);
>
> insert into t select i%2000, (100000 * random())::int, random() from
> generate_series(1,1000000) i;
> analyze t;
>
> explain analyze select b from t where c < 0.1 group by b;
>
> -- on master
> HashAggregate (cost=18659.28..19598.74 rows=93946 width=4)
> (actual time=220.760..234.439 rows=63224 loops=1)
>
> -- on patched
> HashAggregate (cost=18659.28..19294.25 rows=63497 width=4)
> (actual time=235.161..250.023 rows=63224 loops=1)
>
> With the patch the estimate for the number of distinct 'b' values is
> more accurate.
>
> BTW, this patch does not change any existing regression test results. I
> attempted to devise a regression test that shows how this change can
> improve query plans, but failed. Should I try harder to find such a
> test case?
Hi,
thank you for the patch ; I've tried it and it works with the scenario
you provide.
As Nicolas's co-worker, I've been involved in this case, but,
unfortunately, we're not able to test the patch with the actual data for
the moment, but I'll ask a dump to the real owner.
About the regression test, I don't know how to implement it either.
best regards,
--
Sébastien
From | Date | Subject | |
---|---|---|---|
Next Message | Zhijie Hou (Fujitsu) | 2023-12-22 09:41:13 | RE: Synchronizing slots from primary to standby |
Previous Message | Alvaro Herrera | 2023-12-22 09:05:54 | Re: Set all variable-length fields of pg_attribute to null on column drop |