Re: speeding up planning with partitions

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speeding up planning with partitions
Date: 2018-10-16 01:58:32
Message-ID: 83af57c1-896b-7daf-8e24-8b21a24fc1c3@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

I've managed to get back to the rest of your comments. Sorry that it took
me a while.

On 2018/09/11 8:23, David Rowley wrote:
> On 30 August 2018 at 21:29, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Attached updated patches, with 0002 containing the changes mentioned above.
>
> Here's my first pass review on this:
>
> 0002:
>
> 10. I think moving the PlannerInfo->total_table_pages calculation
> needs to be done in its own patch. This is a behavioural change where
> we no longer count pruned relations in the calculation which can
> result in plan changes. I posted a patch in [1] to fix this as a bug
> fix as I think the current code is incorrect. My patch also updates
> the first paragraph of the comment. You've not done that.

As mentioned on the other thread, I've included your patch in my own series.

> 11. "pruning"
>
> + * And do prunin. Note that this adds AppendRelInfo's of only the

Fixed.

> 12. It's much more efficient just to do bms_add_range() outside the loop in:
>
> + for (i = 0; i < rel->nparts; i++)
> + {
> + rel->part_rels[i] = build_partition_rel(root, rel,
> + rel->part_oids[i]);
> + rel->live_parts = bms_add_member(rel->live_parts, i);
> + }

You're right, I almost forgot about bms_add_range that we just recently added.

> 13. In set_append_rel_size() the foreach(l, root->append_rel_list)
> loop could be made to loop over RelOptInfo->live_parts instead which
> would save having to skip over AppendRelInfos that don't belong to
> this parent. You'd need to make live_parts more general purpose and
> also use it to mark children of inheritance parents.

Hmm, I've thought about it, but live_parts is a set of indexes into the
part_rels array, which is meaningful only for partitioned tables.

I'm not really sure if we should generalize part_rels to child_rels and
set it even for regular inheritance, if only for the efficiency of
set_append_rel_size and set_append_rel_pathlists.

In the updated patches, I've managed to make this whole effort applicable
to regular inheritance in general (as I said I would last month), where
the only special case code needed for partitioning is to utilize the
functionality of partprune.c, but I wasn't generalize everything. We can,
as a follow on patch maybe.

> 14. I think you can skip the following if both are NULL. You could
> likely add more smarts for different join types, but I think you
> should at least skip when both are NULL. Perhaps the loop could be
> driven off of bms_intersec of the two Rel's live_parts?
>
> + if (child_rel1 == NULL)
> + child_rel1 = build_dummy_partition_rel(root, rel1, cnt_parts);
> + if (child_rel2 == NULL)
> + child_rel2 = build_dummy_partition_rel(root, rel2, cnt_parts);

Actually, not needing to build the dummy partition rel here is something
we could get back to doing someday, but not in this patch. The logic
involved to do so is too much related to join planning, so I thought we
could pursue it later.

> 15. The following is not required when append_rel_array was previously NULL.
>
> + MemSet(root->append_rel_array + root->simple_rel_array_size,
> + 0, sizeof(AppendRelInfo *) * num_added_parts);

Yeah, I've fixed that. If previously NULL, it's palloc'd.

> 16. I don't think scan_all_parts is worth the extra code. The cost of
> doing bms_num_members is going to be pretty small in comparison to
> building paths for and maybe doing a join search for all partitions.
>
> + num_added_parts = scan_all_parts ? rel->nparts :
> + bms_num_members(partindexes);
>
> In any case, you've got a bug in prune_append_rel_partitions() where
> you're setting scan_all_parts to true instead of returning when
> contradictory == true.

I too started disliking it, so got rid of scan_all_parts.

> 17. lockmode be of type LOCKMODE, not int.
>
> + Oid childOID, int lockmode,

This might no longer be the same code as you saw when reviewing, but I've
changed all lockmode variables in the patch to use LOCKMODE.

> 18. Comment contradicts the code.
>
> + /* Open rel if needed; we already have required locks */
> + if (childOID != parentOID)
> + {
> + childrel = heap_open(childOID, lockmode);
>
> I think you should be using NoLock here.
>
> 19. Comment contradicts the code.
>
> + /* Close child relations, but keep locks */
> + if (childOID != parentOID)
> + {
> + Assert(childrel != NULL);
> + heap_close(childrel, lockmode);
> + }

These two blocks, too. There is no such code with the latest patch.
Code's been moved such that there is no need for such special handling.

> 20. I assume translated_vars can now be NIL due to
> build_dummy_partition_rel() not setting it.
>
> - if (var->varlevelsup == 0 && appinfo)
> + if (var->varlevelsup == 0 && appinfo && appinfo->translated_vars)
>
> It might be worth writing a comment to explain that, otherwise it's
> not quite clear why you're doing this.

Regarding this and and one more comment below, I've replied below.

> 21. Unrelated change;
>
> Assert(relid > 0 && relid < root->simple_rel_array_size);
> +

Couldn't find it in the latest patch, so must be gone.

> 22. The following comment mentions that Oids are copied, but that does
> not happen in the code.
>
> + /*
> + * For partitioned tables, we just allocate space for RelOptInfo's.
> + * pointers for all partitions and copy the partition OIDs from the
> + * relcache. Actual RelOptInfo is built for a partition only if it is
> + * not pruned.
> + */
>
> The Oid copying already happened during get_relation_info().

Couldn't find this comment in the new code.

> 23. Traditionally translated_vars populated with a sequence of Vars in
> the same order to mean no translation. Here you're changing how that
> works:
>
> + /* leaving translated_vars to NIL to mean no translation needed */
>
> This seems to be about the extent of your documentation on this, which
> is not good enough.

I've changed the other code such that only the existing meaning of no-op
translation list is preserved, that is, the one you wrote above. I
modified build_dummy_partition_rel such that a no-op translated vars is
built based on parent's properties, instead of leaving it NIL. I've also
removed the special case code in adjust_appendrel_attrs_mutator which
checked translated_vars for NIL.

> 24. "each" -> "reach"? ... Actually, I don't understand the comment.
> In a partitioned hierarchy, how can the one before the top-level
> partitioned table not be a partitioned table?
>
> /*
> * Keep moving up until we each the parent rel that's not a
> * partitioned table. The one before that one would be the root
> * parent.
> */

This comment and the code has since been rewritten, but to clarify, this
is the new comment:

* Figuring out if we're the root partitioned table is a bit involved,
* because the root table mentioned in the query itself might be a
* child of UNION ALL subquery.
*/

> 25. "already"
>
> + * expand_inherited_rtentry alreay locked all partitions, so pass

No longer appears in the latest patch.

> 26. Your changes to make_partitionedrel_pruneinfo() look a bit broken.
> You're wrongly assuming live_parts is the same as present_parts. If a
> CHECK constraint eliminated the partition then those will be present
> in live_parts but won't be part of the Append/MergeAppend subplans.
> You might be able to maintain some of this optimisation by checking
> for dummy rels inside the loop, but you're going to need to put back
> the code that sets present_parts.
>
> + present_parts = bms_copy(subpart->live_parts);

You're right. The new code structure is such that it allows to delete the
partition index of a partition that's excluded by constraints, so I fixed
it so that live_parts no longer contains the partitions that are excluded
by constraints.

The existing code that sets present parts goes through all partitions by
looping over nparts members of part_rels, which is a pattern I think we
should avoid, as I'd think you'd agree.

> 27. Comment contradicts the code:
>
> + Bitmapset *live_parts; /* unpruned parts; NULL if all are live */
>
> in add_rel_partitions_to_query() you're doing:
>
> + if (scan_all_parts)
> + {
> + for (i = 0; i < rel->nparts; i++)
> + {
> + rel->part_rels[i] = build_partition_rel(root, rel,
> + rel->part_oids[i]);
> + rel->live_parts = bms_add_member(rel->live_parts, i);
> + }
> + }
>
> so the NULL part seems untrue.

I've rewritten the comment.

>
> 28. Missing comments:
>
> + TupleDesc tupdesc;
> + Oid reltype;
>
> 29. The comment for prune_append_rel_partitions claims it "Returns RT
> indexes", but that's not the case, per:
>
> -Relids
> -prune_append_rel_partitions(RelOptInfo *rel)
> +void
> +prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)

Has changed in the latest patch and now it returns a bitmapset of
partition indexes, same as what get_matching_partitions does.

> 0003:
>
> 30. 2nd test can be tested inside the first test to remove redundant
> partition check.
>
> - inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
> + if (rte->relkind != RELKIND_PARTITIONED_TABLE)
> + inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
>
> /*
> * Check that there's at least one descendant, else treat as no-child
> * case. This could happen despite above has_subclass() check, if table
> * once had a child but no longer does.
> */
> - if (list_length(inhOIDs) < 2)
> + if (rte->relkind != RELKIND_PARTITIONED_TABLE && list_length(inhOIDs) < 2)
> {

This code has changed quite a bit in the latest patch, so the comment no
longer applies.

> 31. The following code is wrong:
>
> + /* Determine the correct lockmode to use. */
> + if (rootRTindex == root->parse->resultRelation)
> + lockmode = RowExclusiveLock;
> + else if (rootrc && RowMarkRequiresRowShareLock(rootrc->markType))
> + lockmode = RowShareLock;
> + else
> + lockmode = AccessShareLock;
>
> rootRTIndex remains at 0 if there are no row marks and resultRelation
> will be 0 for SELECT queries, this means you'll use a RowExclusiveLock
> for a SELECT instead of an AccessShareLock.
>
> Why not just check the lockmode of the parent and use that?

No such logic exists anymore due to recent developments (addition of
rellockmode to RTE).

I'll post the latest patches in this week.

Thanks,
Amit

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2018-10-16 02:45:53 Re: Creating Certificates
Previous Message Andres Freund 2018-10-15 23:45:03 Re: Large writable variables