Re: speeding up planning with partitions

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speeding up planning with partitions
Date: 2018-10-15 09:24:15
Message-ID: 0791f750-13a9-ee15-e42d-e35a257f16d5@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Dilip,

Thanks for your review comments. Sorry it took me a while replying to them.

On 2018/09/29 22:30, Dilip Kumar wrote:
> I was going through your patch (v3-0002) and I have some suggestions
>
> 1.
> - if (nparts > 0)
> + /*
> + * For partitioned tables, we just allocate space for RelOptInfo's.
> + * pointers for all partitions and copy the partition OIDs from the
> + * relcache. Actual RelOptInfo is built for a partition only if it is
> + * not pruned.
> + */
> + if (rte->relkind == RELKIND_PARTITIONED_TABLE)
> + {
> rel->part_rels = (RelOptInfo **)
> - palloc(sizeof(RelOptInfo *) * nparts);
> + palloc0(sizeof(RelOptInfo *) * rel->nparts);
> + return rel;
> + }
>
> I think we can delay allocating memory for rel->part_rels? And we can
> allocate in add_rel_partitions_to_query only
> for those partitions which survive pruning.

Unfortunately, we can't do that. The part_rels array is designed such
that there is one entry in it for each partition of a partitioned table
that physically exists. Since the pruning code returns a set of indexes
into the array of *all* partitions, the planner's array must also be able
to hold *all* partitions, even if most of them would be NULL in the common
case.

Also, partition wise join code depends on finding a valid (even if dummy)
RelOptInfo for *all* partitions of a partitioned table to support outer
join planning. Maybe, we can change partition wise join code to not
depend on such dummy-marked RelOptInfos on the nullable side, but until
then we'll have to leave part_rels like this.

> 2.
> add_rel_partitions_to_query
> ...
> + /* Expand the PlannerInfo arrays to hold new partition objects. */
> + num_added_parts = scan_all_parts ? rel->nparts :
> + bms_num_members(partindexes);
> + new_size = root->simple_rel_array_size + num_added_parts;
> + root->simple_rte_array = (RangeTblEntry **)
> + repalloc(root->simple_rte_array,
> + sizeof(RangeTblEntry *) * new_size);
> + root->simple_rel_array = (RelOptInfo **)
> + repalloc(root->simple_rel_array,
> + sizeof(RelOptInfo *) * new_size);
> + if (root->append_rel_array)
> + root->append_rel_array = (AppendRelInfo **)
> + repalloc(root->append_rel_array,
> + sizeof(AppendRelInfo *) * new_size);
> + else
> + root->append_rel_array = (AppendRelInfo **)
> + palloc0(sizeof(AppendRelInfo *) *
> + new_size);
>
> Instead of re-pallocing for every partitioned relation can't we first
> count the total number of surviving partitions and
> repalloc at once.

Hmm, doing this seems a bit hard too. Since the function to perform
partition pruning (prune_append_rel_partitions), which determines the
number of partitions that will be added, currently contains a RelOptInfo
argument for using the partitioning info created by
set_relation_partition_info, we cannot call it on a partitioned table
unless we've created a RelOptInfo for it. So, we cannot delay creating
partition RelOptInfos to a point where we've figured out all the children,
because some of them might be partitioned tables themselves. IOW, we must
create them as we process each partitioned table (and its partitioned
children recursively) and expand the PlannerInfo arrays as we go.

I've thought about the alternative(s) that will allow us to do what you
suggest, but it cannot be implemented without breaking how we initialize
partitioning info in RelOptInfo. For example, we could refactor
prune_append_rel_array's interface to accept a Relation pointer instead of
RelOptInfo, but we don't have a Relation pointer handy at the point where
we can do pruning without re-opening it, which is something to avoid.
Actually, that's not the only refactoring needed as I've come to know when
trying to implement it.

On a related note, with the latest patch, I've also delayed regular
inheritance expansion as a whole, to avoid making the partitioning
expansion a special case. That means we'll expand the planner arrays
every time a inheritance parent is encountered in the range table. The
only aspect where partitioning becomes a special case in the new code is
that we can call the pruning code even before we've expanded the planner
arrays and the pruning limits the size to which the arrays must be
expanded to. Regular inheritance requires that the planner arrays be
expanded by the amount given by
list_length(find_all_inheritors(root_parent_oid)).

> 3.
> + /*
> + * And do prunin. Note that this adds AppendRelInfo's of only the
> + * partitions that are not pruned.
> + */
>
> prunin/pruning

I've since rewritten this comment and fixed the misspelling.

I'm still working on some other comments.

Thanks,
Amit

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Krzysztof Nienartowicz 2018-10-15 10:04:13 Re: Speeding up INSERTs and UPDATEs to partitioned tables
Previous Message Peter Eisentraut 2018-10-15 08:36:53 Re: pgbench exit code