Re: BUG #15947: Worse plan is chosen after giving the planner more freedom (partitionwise join)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ladayaroslav(at)yandex(dot)ru
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15947: Worse plan is chosen after giving the planner more freedom (partitionwise join)
Date: 2019-08-11 16:39:17
Message-ID: 786.1565541557@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> After creating of the tables below:
> ...
> ANALYZE sg_19_01_d, sg_19_02_d, sc_19_01_d, sc_19_02_d;

Hm, you made a tactical error there: you should have done

ANALYZE sg, sc;

ie analyze the parent partitioned tables not the partitions.
As you have it, we never compute inherited stats across the
whole partitioned tables, which leads to wrong estimates
about the join size and hence cost:

Aggregate (cost=147.25..147.26 rows=1 width=8) (actual time=9.934..9.934 rows=1 loops=1)
-> Hash Join (cost=80.00..146.62 rows=250 width=0) (actual time=6.204..9.632 rows=2000 loops=1)
...

After analyzing the parents it's much better:

Aggregate (cost=195.00..195.01 rows=1 width=8) (actual time=6.199..6.200 rows=1 loops=1)
-> Hash Semi Join (cost=88.00..190.00 rows=2000 width=0) (actual time=2.586..5.885 rows=2000 loops=1)
...

The size and cost estimates for partitionwise-join paths are made by
adding up the per-partition sizes/costs, so that those are a lot
closer to being correct even without any parent-level stats:

Aggregate (cost=175.00..175.01 rows=1 width=8) (actual time=4.562..4.563 rows=1 loops=1)
-> Append (cost=39.00..170.00 rows=2000 width=0) (actual time=1.105..4.327 rows=2000 loops=1)
...

So *if* you have parent-level stats in place, this example works well:
the partitionwise join is estimated as cheaper than the other way, and
that estimate is correct, and all is great.

But when you don't, why doesn't it seize on the incorrectly-estimated-
as-cheaper non-partitioned join? The answer seems to be that
apply_scanjoin_target_to_paths throws away all the non-partitioned
paths, if the join is partitioned. It claims that

* If the rel is partitioned, we want to drop its existing paths and
* generate new ones. This function would still be correct if we kept the
* existing paths: we'd modify them to generate the correct target above
* the partitioning Append, and then they'd compete on cost with paths
* generating the target below the Append. However, in our current cost
* model the latter way is always the same or cheaper cost, so modifying
* the existing paths would just be useless work.

but evidently that's not really true when considering partitioned vs
non-partitioned joins. It holds only if cost estimates that are made
in very different ways are comparable.

I'm inclined to think that the partitioned estimates are more trustworthy
than the non-partitioned ones, so maybe we should leave things as they
stand for now. Still, this is another point that's causing me to form
an increasingly hardened conviction that apply_scanjoin_target_to_paths
needs to be nuked from orbit. It's been nothing but trouble since it
was committed, which is unsurprising considering how many planner
structural conventions it tramples on. I'm not very sure what a better
way would look like though :-(.

Another point that this example raises is that letting autoanalyze
ignore partition parent tables is not just a minor problem, but
a potentially serious bug, causing very poor plan choices to be
made for any nontrivial queries involving partitioned tables.

I don't see us doing anything about either of these points in the
very short term, and certainly not back-patching any changes into
released branches. But we ought to think about fixes going forward.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-08-11 21:32:49 Re: BUG #15892: URGENT: Using an ICU collation in a primary key column breaks ILIKE query
Previous Message Erik Rijkers 2019-08-10 23:54:01 Re: BUG #15947: Worse plan is chosen after giving the planner more freedom (partitionwise join)