Re: speeding up planning with partitions

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speeding up planning with partitions
Date: 2019-01-11 13:00:24
Message-ID: cf0cf28b-66ee-568c-c9f5-5577ac968959@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Thanks for reviewing, David, Imai-san. Replying to all reviews (up to and
including David's comments earlier today) with this one email so that I
can attach the finished patch here.

On 2019/01/09 9:09, David Rowley wrote:
> On Tue, 8 Jan 2019 at 19:30, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Rebased due to changed copyright year in prepunion.c. Also realized that
>> the new files added by patch 0004 still had 2018 in them.
>
> I've made a pass over 0001. There's probably enough for you to look at
> while I look at 0002 and the others.
>
> 0001
>
> 1. In your doc changes, just below a paragraph that you removed,
> there's a paragraph starting "Both of these behaviors are likely to be
> changed in a future release". This needs to be fixed since you've
> removed the first of the two reasons.

OK, I've fixed the sentence and moved it to the previous paragraph.

> 2. This part should be left alone.
>
> - technique similar to partition pruning. While it is primarily used
> - for partitioning implemented using the legacy inheritance method, it can be
> - used for other purposes, including with declarative partitioning.
> + technique similar to partition pruning. It is primarily used
> + for partitioning implemented using the legacy inheritance method.
>
> Looking at set_inherit_target_rel_sizes(), constraint exclusion still
> is applied to partitions, it's just applied after pruning, according
> to:

OK, I've restored the sentence.

> 3. add_child_rel_equivalences(). You're replacing parent EMs with
> their child equivalent, but only when the eclass has no volatile
> functions. Is this really what you want? I think this would misbehave
> if we ever allowed: UPDATE ... SET .. ORDER BY, of which there's a
> legitimate use case of wanting to reduce the chances of deadlocks
> caused by non-deterministic UPDATE order. Or if you think skipping
> the volatile eclasses is fine today, then I think the comment you've
> added to add_child_rel_equivalences should mention that.

To be honest, I hadn't considered this aspect before.

I think it would be OK to create a new copy of the EC even if it's a
volatile one as we're creating an entirely new one for the child's
planning. Maybe, this will also ensure that someone who will work in the
future on implementing UPDATE SET ORDER BY, they won't have to fiddle with
this code.

> 4. Do you think it's okay that add_child_rel_equivalences() does not
> update the ec_relids when removing the member?

That's an oversight. Fixed by making add_child_rel_equivalences do this:

+ cur_ec->ec_relids = bms_difference(cur_ec->ec_relids,
+ parent_rel->relids);
+ cur_ec->ec_relids = bms_add_members(cur_ec->ec_relids,
+ child_rel->relids);

>
> UPDATE: I see you're likely leaving this alone since you're only doing
> a shallow copy of the eclasses in
> adjust_inherited_target_child_root(). It seems like a pretty bad idea
> to do a shallow copy there.

So, you're talking about this code:

/*
* Child root should get its own copy of ECs, because they'll be modified
* to replace parent EC expressions by child expressions in
* add_child_rel_equivalences.
*/
subroot->eq_classes = NIL;
foreach(lc, root->eq_classes)
{
EquivalenceClass *ec = lfirst(lc);
EquivalenceClass *new_ec = makeNode(EquivalenceClass);

memcpy(new_ec, ec, sizeof(EquivalenceClass));
new_ec->ec_members = list_copy(ec->ec_members);
subroot->eq_classes = lappend(subroot->eq_classes, new_ec);
}

Can you say what you think is wrong with this way of making a copy of the ECs?

>
> 5. What's CE?
>
> + /* CE failed, so finish copying/modifying join quals. */

Constraint exclusion. It seems I needed to fix comments around here.

> 6. Typo:
>
> + * ass dummy. We must do this in this phase so that the rel's
>
> ass -> as

Oops! Fixed.

>
> 7. There's no accumulation going on here:
>
> + /*
> + * Accumulate size information from each live child.
> + */
> + Assert(childrel->rows > 0);

Removed the comment.

>
> 8. Any point in this? We're about to loop again anyway...
>
> + /*
> + * If child is dummy, ignore it.
> + */
> + if (IS_DUMMY_REL(childrel))
> + continue;
> + }

Removed this code.

> 9. It's a bit confusing to mention SELECT in this comment. The Assert
> ensures it's an UPDATE or DELETE.
>
> + /*
> + * For UPDATE/DELETE queries, the top parent can only ever be a table.
> + * As a contrast, it could be a UNION ALL subquery in the case of SELECT.
> + */
> + Assert(root->parse->commandType == CMD_UPDATE ||
> + root->parse->commandType == CMD_DELETE);

I guess we don't need the 2nd sentence. Removed.

> 10. I'd say the subroot assignment can go after the IS_DUMMY_REL
> check. Keeping that loop as tight as possible for pruned rels seems
> like a good idea.
>
> + subroot = root->inh_target_child_roots[appinfo->child_relid];
> + Assert(subroot->parse->resultRelation > 0);
> + childrel = find_base_rel(root, appinfo->child_relid);
> +
> + /* Ignore excluded/pruned children. */
> + if (IS_DUMMY_REL(childrel))
> + continue;

Agreed, done.

> 11. I don't think you should reuse the childrel variable here:
>
> + childrel->reloptkind = RELOPT_BASEREL;
> +
> + Assert(subroot->join_rel_list == NIL);
> + Assert(subroot->join_rel_hash == NULL);
> +
> + /* Perform join planning and save the resulting RelOptInfo. */
> + childrel = make_rel_from_joinlist(subroot, translated_joinlist);
> +
> + /*
> + * Remember this child target rel. inheritance_planner will perform
> + * the remaining steps of planning for each child relation separately.
> + * Specifically, it will call grouping_planner on every
> + * RelOptInfo contained in the inh_target_child_rels list, each of
> + * which represents the source of tuples to be modified for a given
> + * target child rel.
> + */
> + root->inh_target_child_joinrels =
> + lappend(root->inh_target_child_joinrels, childrel);

OK, I've added a new variable called childjoinrel.

> 12. The following comment makes less sense now that you've modified
> the previous paragraph:
>
> + * Fortunately, the UPDATE/DELETE target can never be the nullable side of an
> + * outer join, so it's OK to generate the plan this way.
>
> This text used to refer to:
>
> but target inheritance has to be expanded at
> * the top. The reason is that for UPDATE, each target relation needs a
> * different targetlist matching its own column set. Fortunately,
> * the UPDATE/DELETE target can never be the nullable side of an outer join,
> * so it's OK to generate the plan this way.
>
> you no longer describe plan as being expanded from the top rather than
> at the bottom, which IMO is what "this way" refers to.

To be honest, I never quite understood what that line really means, which
is why I kept it around. What I do know though is that, even with the
patched, we still generate subpaths for each child whose targetlist
patches the child, so I think the part about "this way" that prompted
someone to write that line still remains. Does that make sense to you?

> 13. "tree is" -> "tree are" (references is plural)
>
> + * references in the join tree to the original target relation that's the
> + * root parent of the inheritance tree is replaced by each of its

Fixed.

> 14. Any reason to move this line from its original location?
>
> Assert(parse->commandType != CMD_INSERT);
> + parent_rte = planner_rt_fetch(top_parentRTindex, root);
>
> Previously it was assigned just before it was needed and there's a
> fast path out after where you moved it to and where it was.

Next patch in the series needs it moved, but no reason for this patch to
move it. Put it back where it was.

> 15. relation_excluded_by_constraints(), the switch
> (constraint_exclusion), you could consider turning that into
>
> if (constraint_exclusion == CONSTRAINT_EXCLUSION_OFF)
> return false;
> /*
> * When constraint_exclusion is set to 'partition' we only handle
> * OTHER_MEMBER_RELs.
> */
> else if (constraint_exclusion == CONSTRAINT_EXCLUSION_PARTITION &&
> rel->reloptkind != RELOPT_OTHER_MEMBER_REL)
> return false;
>
> When I wrote that code I was trying my best to make the complex rules
> as simple as possible by separating them out. The rules have become
> quite simple after your change, so it probably does not warrant having
> the switch.

OK, done.

> 16. I think the following comment needs to explain how large this
> array is and what indexes it. The current comment would have you
> think there are only enough elements to store PlannerInfos for child
> rels and leaves you guessing about what they're indexed by.
>
> + /*
> + * PlannerInfos corresponding to each inheritance child targets.
> + * Content of each PlannerInfo is same as the parent PlannerInfo, except
> + * for the parse tree which is a translated copy of the parent's parse
> + * tree.
> + */
> + struct PlannerInfo **inh_target_child_roots;

I've added those details to the comment.

>
> 17. I'm getting an assert failure in add_paths_to_append_rel() for
> Assert(parallel_workers > 0) during the partition_join tests.

I guess that was due to not using the correct root in
inherit_target_rel_pathlists. Fixed.

On 2019/01/10 18:12, David Rowley wrote:>
> I'd been looking at v11's 0002 and started on 0003 too. I'll include
> my notes so far if you're about to send a v13.
>
>
> v11 0002
>
> 18. There's a missing case in the following code. I understand that
> makeNode() will 0 the member here, but that does not stop the various
> other initialisations that set the default value for the field. Below
> there's a missing case where parent != NULL && parent->rtekind !=
> RTE_RELATION. You might be better just always zeroing the field below
> "rel->partitioned_child_rels = NIL;"
>
> +
> + /*
> + * For inheritance child relations, we also set inh_root_parent.
> + * Note that 'parent' might itself be a child (a sub-partitioned
> + * partition), in which case we simply use its value of
> + * inh_root_parent.
> + */
> + if (parent->rtekind == RTE_RELATION)
> + rel->inh_root_parent = parent->inh_root_parent > 0 ?
> + parent->inh_root_parent :
> + parent->relid;
> }
> else
> + {
> rel->top_parent_relids = NULL;
> + rel->inh_root_parent = 0;
> + }

Okay, done. Actually, also did the same for top_parent_relids.

> 19. Seems strange to have a patch that adds a new field that is
> unused. I also don't quite understand yet why top_parent_relids can't
> be used instead. I think I recall being confused about that before, so
> maybe it's worth writing a comment to mention why it cannot be used.

See if this updated comment makes it any clearer:

/*
* For inheritance children, this is the RT index of inheritance table
* mentioned in the query from which this relation originated.
* top_parent_relids cannot be used for this, because if the inheritance
* root table is itself under UNION ALL, top_parent_relids contains the
* RT index of UNION ALL parent subquery.
*/

This is its own patch, because it was thought it could be useful to
another thread which has since stalled:

https://www.postgresql.org/message-id/be766794-eb16-b798-52ec-1f786b26b61b%40lab.ntt.co.jp

> v11 0003
>
> 20. This code looks wrong:
>
> + /*
> + * expand_inherited_tables may have proved that the relation is empty, so
> + * check if it's so.
> + */
> + else if (rte->inh && !IS_DUMMY_REL(rel))
>
>
> Likely you'll want:
>
> else if rte->inh)
> {
> if (IS_DUMMY_REL(rel))
> return;
> // other stuff
> }
>
> otherwise, you'll end up in the else clause when you shouldn't be.

OK, done that way.

> 21. is -> was
>
> + * The child rel's RelOptInfo is created during
> + * expand_inherited_tables().
> */
> childrel = find_base_rel(root, childRTindex);
>
> since you're talking about something that already happened.

OK, fixed.

On 2019/01/11 13:01, David Rowley wrote:>
> A few more comments based on reading v12.
>
> v12 0002:
>
> 1. Missing braces around the else clause. (Should be consistent with
> the "if" above)
>
> + if (!has_live_children)
> + {
> + /*
> + * All children were excluded by constraints, so mark the relation
> + * ass dummy. We must do this in this phase so that the rel's
> + * dummy-ness is visible when we generate paths for other rels.
> + */
> + set_dummy_rel_pathlist(rel);
> + }
> + else
> + /*
> + * Set a non-zero value here to cope with the caller's requirement
> + * that non-dummy relations are actually not empty. We don't try to
> + * be accurate here, because we're not going to create a path that
> + * combines the children outputs.
> + */
> + rel->rows = 1;

Agreed, done.

> v12 0004:
>
> 2. I wonder if there's a better way, instead of doing this:
>
> + if (child_rel1 == NULL)
> + child_rel1 = build_dummy_partition_rel(root, rel1, cnt_parts);
> + if (child_rel2 == NULL)
> + child_rel2 = build_dummy_partition_rel(root, rel2, cnt_parts);
>
> maybe add some logic in populate_joinrel_with_paths() to allow NULL
> rels to mean dummy rels. There's a bit of a problem there as the
> joinrel has already been created by that time, but perhaps
> populate_joinrel_with_paths is a better place to decide if the dummy
> rel needs to be built or not.

Hmm, I'd thought about that, but concluded that I shouldn't mix that work
with this refactoring project. We can try to hack the join planning code
later, but until then build_dummy_partition_rel can keep things working.
Once we have a working solution that works without having to create this
dummy RelOptInfo, we can remove build_dummy_partition_rel.

> 3. I wonder if there's a better way to handle what
> build_dummy_partition_rel() does. I see the child's relid to the
> parent's relid and makes up a fake AppendRelInfo and puts it in the
> parent's slot. What's going to happen when the parent is not the
> top-level parent? It'll already have a AppendRelInfo set.

Yeah, the parent's AppendRelInfo would already be present and it won't be
replaced:

if (root->append_rel_array[parent->relid] == NULL)
{
AppendRelInfo *appinfo = make_append_rel_info(parent, parentrte,
parent->tupdesc,
parentrte->relid,
parent->reltype,
parent->relid);

root->append_rel_array[parent->relid] = appinfo;
}

Now you'll say why the discrepancy? For the top-level parent's dummy
children, appinfo generated is actually no-op, because it's generated
using the above code. But for an intermediate parent's dummy parent's
there already exists a "real" appinfo. It doesn't make much difference as
far as I can tell, because the appinfo is not used for anything significant.

> I had thought something like the following could break this, but of
> course, it does not since we build the dummy rel for the pruned
> sub_parent2, so we don't hit the NULL relation case when doing the
> next level. i.e we only make dummies for the top-level, never dummys
> of joinrels.
>
> Does that not mean that the if (parent->top_parent_relids) will always
> be false in build_dummy_partition_rel() and it'll only ever get
> rtekind == RTE_RELATION?

Well, parentrte->rtekind should always be RTE_RELATION in
build_dummy_partition_rel, because partitionwise join is considered only
for partitioned tables and joinrels resulting from joining partitioned tables.

parent->top_parent_relids might be non-NULL if it's an intermediate parent.

> 4. How are dummy rels handled in grouping_planner()?
>
> I see you have this:
>
> - if (IS_DUMMY_REL(planned_rel))
> + if (!parent_rte->inh || IS_DUMMY_REL(planned_rel))
> {
> grouping_planner(root, false, planned_rel, 0.0);
> return;
>
> With the above example I tried to see how it was handled by doing:
>
> postgres=# update parent set c = c where a = 333;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>
> I didn't look into what's causing the crash.

I tried your example, but it didn't crash for me:

explain update parent set c = c where a = 333;
QUERY PLAN
────────────────────────────────────────────────────
Update on parent (cost=0.00..0.00 rows=0 width=0)
-> Result (cost=0.00..0.00 rows=0 width=54)
One-Time Filter: false
(3 rows)

> 5. Wondering why you removed the if (childOID != parentOID) from:
>
> - if (childOID != parentOID && RELATION_IS_OTHER_TEMP(newrelation))
> - {
> - heap_close(newrelation, lockmode);
> - continue;
> - }
>
> Isn't that releasing the only lock we hold on the rel defined in the query?

I think I mistakenly removed it. Added it back.

> 6. expand_planner_arrays() zeros a portion of the append_rel_array
> even if it just palloc0'd the array. While it's not a bug, it is
> repeat work. It should be okay to move the Memset() up to the
> repalloc().

Done. Also moved other MemSets to their respective repallocs.

> 7. I see get_relation_info() grew an extra parameter. Would it now be
> better just to pass rte instead of doing;
>
> get_relation_info(root, rte->relid, rte->inh, rte->updatedCols,
> rel);
>

OK, done.

On 2019/01/10 15:37, Imai, Yoshikazu wrote:>
> I also have some comments on 0001, set_inherit_target_rel_sizes().
>
> In set_inherit_target_rel_sizes():
>
> Some codes are placed not the same order as set_append_rel_size().
>
> 0001: at line 325-326,
> + ListCell *l;
> + bool has_live_children;
>
> In set_append_rel_size(), "has_live_children" is above of the "ListCell *l";

OK, moved to look like set_append_rel_size.

> 0001: at line 582-603
> + if (IS_DUMMY_REL(childrel))
> + continue;
> +
> ...
> + Assert(childrel->rows > 0);
> +
> + /* We have at least one live child. */
> + has_live_children = true;
>
> In set_append_rel_size(),
> + /* We have at least one live child. */
> + has_live_children = true;
> is directly under of
> + if (IS_DUMMY_REL(childrel))
> + continue;

OK, done.

> 0001: at line 606-622,
> + if (!has_live_children)
> + {
> + /*
> + * All children were excluded by constraints, so mark the relation
> + * ass dummy. We must do this in this phase so that the rel's
> + * dummy-ness is visible when we generate paths for other rels.
> + */
> + set_dummy_rel_pathlist(rel);
> + }
> + else
> + /*
> + * Set a non-zero value here to cope with the caller's requirement
> + * that non-dummy relations are actually not empty. We don't try to
> + * be accurate here, because we're not going to create a path that
> + * combines the children outputs.
> + */
> + rel->rows = 1;
>
> In set_append_rel_size(), a condition of if clause is not
> !has_live_children but
> has_live_children.
>
> I also noticed there isn't else block while there is if block.

Things that set_inherit_target_rel_sizes and set_append_rel_size need to
do, respectively, are different, so the code looks different. Anyway,
I've switched the above if else blocks' code so that it looks like this:

if (has_live_children)
rel->rows = 1;
else
set_dummy_rel_pathlist(rel);

Here are the updated patches. I've also attached the patch I posted
earlier today here as 0001-Some-fixups-for-b60c397599-v2.patch.

Thanks you again.

Regards,
Amit

Attachment Content-Type Size
0001-Some-fixups-for-b60c397599-v2.patch text/plain 11.4 KB
v13-0002-Overhaul-inheritance-update-delete-planning.patch text/plain 72.7 KB
v13-0003-Store-inheritance-root-parent-index-in-otherrel-.patch text/plain 3.3 KB
v13-0004-Lazy-creation-of-RTEs-for-inheritance-children.patch text/plain 93.6 KB
v13-0005-Teach-planner-to-only-process-unpruned-partition.patch text/plain 6.9 KB
v13-0006-Do-not-lock-all-partitions-at-the-beginning.patch text/plain 1.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Glukhov 2019-01-11 13:01:51 Re: [PATCH] kNN for btree
Previous Message Etsuro Fujita 2019-01-11 12:50:19 Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0