Re: speeding up planning with partitions

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speeding up planning with partitions
Date: 2019-01-16 06:22:26
Message-ID: 63e707b1-102e-4a27-22e6-6a293cb75e3b@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Thanks for the comments.

On 2019/01/12 17:49, David Rowley wrote:
> On Sat, 12 Jan 2019 at 02:00, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> That's an oversight. Fixed by making add_child_rel_equivalences do this:
>>
>> + cur_ec->ec_relids = bms_difference(cur_ec->ec_relids,
>> + parent_rel->relids);
>> + cur_ec->ec_relids = bms_add_members(cur_ec->ec_relids,
>> + child_rel->relids);
>>
>>>
>>> UPDATE: I see you're likely leaving this alone since you're only doing
>>> a shallow copy of the eclasses in
>>> adjust_inherited_target_child_root(). It seems like a pretty bad idea
>>> to do a shallow copy there.
>>
>> So, you're talking about this code:
>>
>> /*
>> * Child root should get its own copy of ECs, because they'll be modified
>> * to replace parent EC expressions by child expressions in
>> * add_child_rel_equivalences.
>> */
>> subroot->eq_classes = NIL;
>> foreach(lc, root->eq_classes)
>> {
>> EquivalenceClass *ec = lfirst(lc);
>> EquivalenceClass *new_ec = makeNode(EquivalenceClass);
>>
>> memcpy(new_ec, ec, sizeof(EquivalenceClass));
>> new_ec->ec_members = list_copy(ec->ec_members);
>> subroot->eq_classes = lappend(subroot->eq_classes, new_ec);
>> }
>>
>> Can you say what you think is wrong with this way of making a copy of the ECs?
>
> If you shallow copy with memcpy(new_ec, ec,
> sizeof(EquivalenceClass));, then fields such as ec_relids will just
> point to the same memory as the parent PlannerInfo's
> EquivalenceClasses, so when you do your adjustment (as above) on the
> child eclasses, you'll modify memory belonging to the parent. I'd
> assume you'd not want to do that since you need to keep the parent
> intact at least to make copies for other child rels. You'd have
> gotten away with it before since you performed a list_copy() and only
> were deleting the parent's EMs with list_delete_cell() which was
> modifying the copy, not the original list. Since you were missing the
> alteration to ec_relids, then you might not have seen issues with the
> shallow copy, but now that you are changing that field, are you not
> modifying the ec_relids field that still set in the parent's
> PlannerInfo?

Yep. add_eq_member, when adding the child member to the child's copy of
EC, will end up modifying ec_relids that it shares with the parent's copy,
because it uses bms_add_member which modifies the input bitmapset in-place.

> In this instance, you've sidestepped that issue by using
> bms_difference() which creates a copy of the Bitmapset and modifies
> that. I think you'd probably see issues if you tried to use
> bms_del_members(). I think not doing the deep copy is going to give
> other people making changes in this are a hard time in the future.

I thought of inventing a _copyEquivalenceClass but noticed this in
_copyPathKey:

/* EquivalenceClasses are never moved, so just shallow-copy the pointer */
COPY_SCALAR_FIELD(pk_eclass);

I think that won't be a problem in our case, because this comment seems to
be talking about identical copies of a given EC. In our case, we're
talking about making copies that are essentially different because of
parent-to-child translation. However, providing a function in copyfuncs.c
to copy ECs would make it sound like making identical copies of ECs is OK
(which it apparently isn't), so I added the copying code in the place
where the patch wants to use it. The new code takes care to make copies
of all the fields that might change, not just ec_members.

Also, realizing another problem with where the copying code is placed now
in the patch (adjust_inherited_target_child_root), I have moved it to
adjust_inherit_target_rel_sizes so that translation from parent-to-child
of EC members works correctly. Concretely, the problem is that a level 2
partition's EC members won't be created with the patch's current way of
initializing eq_classes in the partition subroots -- all initially get the
copy of root parent's EC members, whereas translation can only occur
between a partition and its immediate parent. Without EC members, a level
2 leaf partition will not be able to merge join. So, for example:

explain update p set a = foo.a+1 from foo where p.a = foo.a;
QUERY PLAN
──────────────────────────────────────────────────────────────────────────
Update on p (cost=0.00..98556.15 rows=6535012 width=16)
Update on p11
Update on p2
-> Nested Loop (cost=0.00..97614.88 rows=6502500 width=16)
-> Seq Scan on p11 (cost=0.00..35.50 rows=2550 width=10)
-> Materialize (cost=0.00..48.25 rows=2550 width=10)
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=10)
-> Merge Join (cost=359.57..941.28 rows=32512 width=16)
Merge Cond: (p2.a = foo.a)
-> Sort (cost=179.78..186.16 rows=2550 width=10)
Sort Key: p2.a
-> Seq Scan on p2 (cost=0.00..35.50 rows=2550 width=10)
-> Sort (cost=179.78..186.16 rows=2550 width=10)
Sort Key: foo.a
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=10)

vs.

explain update p set a = foo.a+1 from foo where p.a = foo.a;
QUERY PLAN
──────────────────────────────────────────────────────────────────────────
Update on p (cost=359.57..1882.55 rows=65024 width=16)
Update on p11
Update on p2
-> Merge Join (cost=359.57..941.28 rows=32512 width=16)
Merge Cond: (p11.a = foo.a)
-> Sort (cost=179.78..186.16 rows=2550 width=10)
Sort Key: p11.a
-> Seq Scan on p11 (cost=0.00..35.50 rows=2550 width=10)
-> Sort (cost=179.78..186.16 rows=2550 width=10)
Sort Key: foo.a
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=10)
-> Merge Join (cost=359.57..941.28 rows=32512 width=16)
Merge Cond: (p2.a = foo.a)
-> Sort (cost=179.78..186.16 rows=2550 width=10)
Sort Key: p2.a
-> Seq Scan on p2 (cost=0.00..35.50 rows=2550 width=10)
-> Sort (cost=179.78..186.16 rows=2550 width=10)
Sort Key: foo.a
-> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=10)
(19 rows)

after fixing. Note that p11 is a partition of p1 which is partition of p.
p2 is partition of p.

>>> 12. The following comment makes less sense now that you've modified
>>> the previous paragraph:
>>>
>>> + * Fortunately, the UPDATE/DELETE target can never be the nullable side of an
>>> + * outer join, so it's OK to generate the plan this way.
>>>
>>> This text used to refer to:
>>>
>>> but target inheritance has to be expanded at
>>> * the top. The reason is that for UPDATE, each target relation needs a
>>> * different targetlist matching its own column set. Fortunately,
>>> * the UPDATE/DELETE target can never be the nullable side of an outer join,
>>> * so it's OK to generate the plan this way.
>>>
>>> you no longer describe plan as being expanded from the top rather than
>>> at the bottom, which IMO is what "this way" refers to.
>>
>> To be honest, I never quite understood what that line really means, which
>> is why I kept it around. What I do know though is that, even with the
>> patched, we still generate subpaths for each child whose targetlist
>> patches the child, so I think the part about "this way" that prompted
>> someone to write that line still remains. Does that make sense to you?
>
> Not quite. I thought that "OK to generate the plan this way" is
> talking about expanding the children at the top of the plan, e.g.
>
> explain (costs off) update listp set b = b + 1 from t where listp.a = t.a;
> QUERY PLAN
> --------------------------------------
> Update on listp
> Update on listp1
> Update on listp2
> -> Merge Join
> Merge Cond: (listp1.a = t.a)
> -> Sort
> Sort Key: listp1.a
> -> Seq Scan on listp1
> -> Sort
> Sort Key: t.a
> -> Seq Scan on t
> -> Merge Join
> Merge Cond: (listp2.a = t.a)
> -> Sort
> Sort Key: listp2.a
> -> Seq Scan on listp2
> -> Sort
> Sort Key: t.a
> -> Seq Scan on t
>
> i.e each non-pruned partition gets joined to the other tables
> (expanded from the top of the plan tree). The other tables appear once
> for each child target relation.
>
> instead of:
>
> postgres=# explain (costs off) select * from listp inner join t on
> listp.a = t.a;
> QUERY PLAN
> --------------------------------------
> Merge Join
> Merge Cond: (t.a = listp1.a)
> -> Sort
> Sort Key: t.a
> -> Seq Scan on t
> -> Sort
> Sort Key: listp1.a
> -> Append
> -> Seq Scan on listp1
> -> Seq Scan on listp2
>
> where the children are expanded from the bottom (or at the leaf side
> of the plan tree), since we always plant our trees up-side-down, in
> computer science.
>
> In my view, since you removed the part about where the children are
> expanded then it does not quite make sense anymore.

Thanks for explaining. I think my previous reply was confusing -- I
wanted to say that we *do* still expand the children at the top, in the
sense that the resulting plan shape hasn't changed due to the patch. I
suspect that it's the resulting plan shape that "generating plan this way"
refers to, which hasn't changed with the patch. It's true that the patch
optimizes away repeated query_planner calls but only because it makes
other arrangements to produce the same output as before.

Attached updated patches with mainly fixes around EC copying as described
above and other cosmetic fixes like comment updates. I've also moved
around the code a bit, putting the new functions
add_inherit_target_child_roots, etc. in inherit.c instead of planmain.c.

Thanks,
Amit

Attachment Content-Type Size
0001-Some-fixups-for-b60c397599-v2.patch text/plain 11.4 KB
v14-0002-Overhaul-inheritance-update-delete-planning.patch text/plain 75.0 KB
v14-0003-Store-inheritance-root-parent-index-in-otherrel-.patch text/plain 3.3 KB
v14-0004-Lazy-creation-of-RTEs-for-inheritance-children.patch text/plain 95.3 KB
v14-0005-Teach-planner-to-only-process-unpruned-partition.patch text/plain 6.9 KB
v14-0006-Do-not-lock-all-partitions-at-the-beginning.patch text/plain 1.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2019-01-16 06:41:58 RE: Protect syscache from bloating with negative cache entries
Previous Message Tatsuo Ishii 2019-01-16 06:02:36 Re: Libpq support to connect to standby server as priority