Re: path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: path toward faster partition pruning
Date: 2017-10-19 06:46:49
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017/09/27 10:22, Amit Langote wrote:
> Thanks for the reminder. Just thought I'd say that while I'm actually
> done rebasing itself (attaching rebased patches to try 'em out), I'm now
> considering Robert's comments and will be busy for a bit revising things
> based on those comments.

And here is the updated, significantly re-designed patch set. I'm
dropping the WIP- from the patches' names and marking this patch set as
the v1 set (Jesper Pedersen pointed out to me offline that the patches
didn't have the vNumber before).

Significant points of revision:

* After thinking a bit more about the applicability/re-usability of the
code being discussed here in the run-time pruning case, I came to the
conclusion that my previous approach was a bit wrongheaded (as perhaps
also what David Rowley was thinking when he commented on some aspects of
the old patch's design at [1], so thanks to him for prodding me in what I
ended up thinking to be a good direction after all).

With the previous approach, a bit too much work would be done by the
planner with no possibility of the code doing that work being useful in
the executor (interface involved passing RelOptInfo *). So, if some
optimization trick that would lead to better pruning decision depended on
the constant values in all the clauses being available, we'd have to skip
that optimization for clauses that would otherwise be chosen as run-time
pruning clauses, because by definition, they would not have constant
values available. In the new design, planner code limits itself to only
matching the clauses to partition key (checking things like whether the
operator of a clause matched to a partition column is compatible with
partitioning, etc.) and adding every matched clause to a list partclauses.
That should work unchanged for both the plan-time pruning case and the
run-time pruning case. We don't look at the supposedly-constant operands
of clauses in the aforementioned planner code at all.

Now if the clauses in partclauses are all known to contain the constant
operand values (the plan-time pruning case), it can immediately pass them
to partition.c to analyze those clauses, extract bounding keys in a form
suitable to do lookup in PartitionBoundInfo and prune partitions that
won't satisfy those bounding keys (and hence the clauses).

If partclauses contains clauses that don't have the constant operand (the
run-time pruning case), don't go to partition.c just yet, instead stuff
the list into the plan (Append) node and go to partition.c only when all
the constant values are available (the patch at [2] will implement that).

* Unlike the previous approach where partition.c would return a list of
integer indexes, where the individual indexes would be those of the bound
datums and not those of partitions themselves, in the new design, indexes
of the selected partitions (their offsets in the PartitionDesc.oids array)
are returned in a way that Robert suggested [3] -- as a range of
contiguous partitions whenever possible (in the form of *min_part_idx and
*max_part_idx) and/or as a set of partitions appearing in no particular
order (in the form of a Bitmapset). Second form is required not only for
default/null-only/list partitions (which do not have any notion of
ordering among each other), but also when individual arms of an OR clause
select partitions scattered all over the place.

* In add_paths_to_append_rel(), the partitioned_rels list passed to copy
into the Append path is no longer same as the one found in
PlannerInfo.pcinfo_list, because the latter contains *all* partitioned
child relations in a partition tree. Instead, the patch teaches it to
only include *live* partitioned child relations.

* Since the partitionwise join code looks at *all* entries in
RelOptInfo.part_rels of both the joining partitioned relations, there is
some new code there to make dead partitions' RelOptInfos look valid with a
dummy path *after-the-fact*. That's because, set_append_rel_size() whose
job it is to initialize certain fields of child RelOptInfos will do it
only for *live* partitions with the new arrangement, where only live
partitions of a partitioned table are processed by the main loop of

Some notes about the regression tests:

Patch 0001 adds new tests for partition-pruning. Constraint exclusion
using internal partition constraints is not disabled in the code until the
last patch, which implements the last piece needed for the new partition
pruning to do any real work. With that patch, we see some differences in
the plan generated using the new partition-pruning code which appear in
the patch as the diffs to expected/inherit.out and expected/partition.out
(latter is the new output file added by 0001). I've almost convinced
myself that those diffs are simply artifacts of the difference in
implementation between constraint exclusion and the new partition-pruning
code and do not change the output that the plans produce. The difference
stems from that either the old or the new method, in some cases, fails to
prune away a partition that should have been. OTOH, in neither case do we
prune away a partition that shouldn't have been. :)

Description of the attached patches:

0001: add new tests for partition-pruning

0002: patch that makes all the changes needed in the planer (adds a stub
function in partition.c)

0003: patch that implements the aforementioned stub (significant amount of
code to analyze partition clauses and gin up bounding keys to
compare with the values in PartitionBoundInfo; the actual function
that will do the comparison is just a stub as of this patch)

0004: make some preparatory changes to partition_bound_cmp/bsearch, to be
able to pass incomplete partition keys (aka, prefix of a multi-
column key) for comparison with the values in PartitionBoundInfo
(just a refactoring patch)

0005: implements the stub mentioned in 0003 and finally gets the new
partition-pruning working (also disables constraint exclusion using
internal partition constraints by teaching get_relation_constraints
to not include those).

Feedback greatly welcome.





Attachment Content-Type Size
0001-Add-new-tests-for-partition-pruning-v1.patch text/plain 35.4 KB
0002-Planner-side-changes-for-partition-pruning-v1.patch text/plain 41.6 KB
0003-Implement-get_partitions_from_clauses-v1.patch text/plain 31.1 KB
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v1.patch text/plain 10.1 KB
0005-Implement-get_partitions_for_keys-v1.patch text/plain 19.4 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2017-10-19 07:05:05 Re: Parallel Append implementation
Previous Message Julien Rouhaud 2017-10-19 06:11:01 Re: 64-bit queryId?