Re: speeding up planning with partitions

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speeding up planning with partitions
Date: 2018-11-21 10:23:52
Message-ID: 73596f90-023c-c358-be9a-3c21bffcb15d@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 2018/11/14 19:28, Amit Langote wrote:
> On 2018/11/10 20:59, David Rowley wrote:
>> 8. In regards to:
>>
>> + * NB: Do we need to change the child EC members to be marked
>> + * as non-child somehow?
>> + */
>> + childrel->reloptkind = RELOPT_BASEREL;
>>
>> I know we talked a bit about this before, but this time I put together
>> a crude patch that runs some code each time we skip an em_is_child ==
>> true EquivalenceMember. The code checks if any of the em_relids are
>> RELOPT_BASEREL. What I'm looking for here are places where we
>> erroneously skip the member when we shouldn't. Running the regression
>> tests with this patch in place shows a number of problems. Likely I
>> should only trigger the warning when bms_membership(em->em_relids) ==
>> BMS_SINGLETON, but it never-the-less appears to highlight various
>> possible issues. Applying the same on master only appears to show the
>> cases where em->em_relids isn't a singleton set. I've attached the
>> patch to let you see what I mean.
>
> Thanks for this. I've been thinking about what to do about it, but
> haven't decided what's that yet. Please let me spend some more time
> thinking on it. AFAICT, dealing with this will ensure that join planning
> against target child relations can use EC-based optimizations, but it's
> not incorrect as is per se.

I've been considered this a bit more and have some observations to share.
I found that the new inheritance_planner causes regression when the query
involves equivalence classes referencing the target relation, such as in
the following example:

create table ht (a int, b int) partition by hash (a);
create table ht1 partition of ht for values with (modulus 1024, remainder 0);
...
create table ht1024 partition of ht for values with (modulus 1024,
remainder 1023);
create table foo (a int, b int);
update ht set a = foo.a from foo where ht.b = foo.b;

For the above query, an EC containing ht.b and foo.b would be built. With
the new approach this EC will need to be expanded to add em_is_child EC
members for all un-pruned child tables, whereas with the previous approach
there would be no child members because the EC would be built for the
child as the query's target relation to begin with. So, with the old
approach there will be {ht1.b, foo.b} for query with ht1 as target
relation, {ht2.b, foo.b} for query with ht2 as target relation and so on.
Whereas with the new approach there will be just one query_planner run and
resulting EC will be {foo.b, ht.b, ht1.b, ht2.b, ...}. So, the planning
steps that manipulate ECs now have to iterate through many members and
become a bottleneck if there are many un-pruned children. To my surprise,
those bottlenecks are worse than having to rerun query_planner for each
child table.

So with master, I get the following planning time for the above update
query which btw doesn't prune (3 repeated runs)

Planning Time: 688.830 ms
Planning Time: 690.950 ms
Planning Time: 704.702 ms

And with the previous v7 patch:

Planning Time: 1373.398 ms
Planning Time: 1360.685 ms
Planning Time: 1356.313 ms

I've fixed that in the attached by utilizing the fact that now we build
the child PlannerInfo before we add child EC members. By modifying
add_child_rel_equivalences such that it *replaces* the parent EC member
with the corresponding child member instead of appending it to the list,
if the child is the target relation. That happens inside the child
target's private PlannerInfo, so it's harmless. Also, it is no longer
marked as em_is_child=true, so as a whole, this more or less restores the
original behavior wrt to ECs (also proved by the fact that I now get the
same regression.diffs by applying David's verify_em_child.diff patch [1]
to the patched tree as with applying it to master modulo the varno
differences due to the patched).

With the attached updated patch (again this is with 0 partitions pruned):

Planning Time: 332.503 ms
Planning Time: 334.003 ms
Planning Time: 334.212 ms

If I add an additional condition so that only 1 partition is joined with
foo and the rest pruned, such as the following query (the case we're
trying to optimize):

update ht set a = foo.a from foo where foo.a = ht.b and foo.a = 1

I get the following numbers with master (no change from 0 pruned case):

Planning Time: 727.473 ms
Planning Time: 726.145 ms
Planning Time: 734.458 ms

But with the patches:

Planning Time: 0.797 ms
Planning Time: 0.751 ms
Planning Time: 0.801 ms

Attached v8 patches.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAKJS1f8g9_BzE678BLBm-eoMMEYUUXhDABSpqtAHRUUTrm_vFA%40mail.gmail.com

Attachment Content-Type Size
v8-0001-Overhaul-inheritance-update-delete-planning.patch text/plain 56.6 KB
v8-0002-Store-inheritance-root-parent-index-in-otherrel-s.patch text/plain 2.5 KB
v8-0003-Lazy-creation-of-RTEs-for-inheritance-children.patch text/plain 84.9 KB
v8-0004-Move-append-expansion-code-into-its-own-file.patch text/plain 112.3 KB
v8-0005-Teach-planner-to-only-process-unpruned-partitions.patch text/plain 7.2 KB
v8-0006-Do-not-lock-all-partitions-at-the-beginning.patch text/plain 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-11-21 11:58:19 Re: Continue work on changes to recovery.conf API
Previous Message Peter Eisentraut 2018-11-21 09:54:21 Re: pgsql: Remove WITH OIDS support, change oid catalog column visibility.