Re: [HACKERS] path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2017-11-17 10:01:04
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi David.

On 2017/11/08 13:44, David Rowley wrote:
> On 7 November 2017 at 01:52, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)
> Hi Amit,
> I had another look over this today. Apologies if any of the review seems petty.

Thanks a lot for the review.

> Here goes:
> 1. If test seems to be testing for a child that's a partitioned table,
> but is testing for a non-NULL part_scheme.
> /*
> * If childrel is itself partitioned, add it and its partitioned
> * children to the list being propagated up to the root rel.
> */
> if (childrel->part_scheme && rel->part_scheme)
> Should this code use IS_PARTITIONED_REL() instead? Seems a bit strange
> to test for a NULL part_scheme

I guess that makes sense, done.

> 2. There's a couple of mistakes in my bms_add_range() code. I've
> attached bms_add_range_fix.patch. Can you apply this to your tree?

Thanks. I have used your bms_add_range_v2.patch that you sent earlier
today and listed both your and Horiguchi-san's names as author.

> 3. This assert seems to be Asserting the same thing twice:
> Assert(rel->live_partitioned_rels != NIL &&
> list_length(rel->live_partitioned_rels) > 0);
> A List with length == 0 is always NIL.

You're right. I changed it to:

Assert(list_length(rel->live_partitioned_rels) >= 1);

> 4. get_partitions_from_clauses(), can you comment why you perform the
> list_concat() there.
> I believe this is there so that the partition bound from the parent is
> passed down to the child so that we can properly eliminate all child
> partitions when the 2nd level of partitioning is using the same
> partition key as the 1st level. I think this deserves a paragraph of
> comment to explain this.

Yes, that's the intent. I implemented it as a solution to fix a problem
that was reported upthread, whereby the default partition pruning didn't
work as desired. I tried to explain it in the following email:

But, since I devised it as a solution to get the desired behavior for the
default partition, I modified the code to include partition constraint
clauses to do it only when the table has a default partition in the first
place. Doing it always is an overkill. Please see the comment added
nearby if it now helps make sense of what's going on.

> 5. Please add a comment to explain what's going on here in
> classify_partition_bounding_keys()
> if (partattno == 0)
> {
> partexpr = lfirst(partexprs_item);
> partexprs_item = lnext(partexprs_item);
> }
> Looks like, similar to index expressions, that partition expressions
> are attno 0 to mark to consume the next expression from the list.


> Does this need validation that there are enough partexprs_item items
> like what is done in get_range_key_properties()? Or is this validated
> somewhere else?

Yeah, I added the check as follows:

+ /* Set partexpr if needed. */
if (partattno == 0)
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
partexpr = lfirst(partexprs_item);
partexprs_item = lnext(partexprs_item);

> 6. Comment claims the if test will test something which it does not
> seem to test for:
> /*
> * Redundant key elimination using btree-semantics based tricks.
> *
> * Only list and range partitioning use btree operator semantics, so
> * skip otherwise. Also, if there are expressions whose value is yet
> * unknown, skip this step, because we need to compare actual values
> * below.
> */
> memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
> if (partkey->strategy == PARTITION_STRATEGY_LIST ||
> partkey->strategy == PARTITION_STRATEGY_RANGE)
> I was expecting this to be skipped when the clauses contained a
> non-const, but it does not seem to.

Fixed the comment. Actually we might end up with non-consts here if
executor invokes it, so the downstream code is in position to handle them,
skipping any optimizations that depend on constant values being available.
There are actually even cases when the planner wouldn't mind calling here
even if the matched clauses contained non-const operands as long as there
are at least some constants available.

> 7. Should be "compare them"
> /*
> * If the leftarg and rightarg clauses' constants are both of the type
> * expected by "op" clause's operator, then compare then using the
> * latter's comparison function.
> */
> But if I look at the code "compare then using the latter's comparison
> function." is not true, it seems to use op's comparison function not
> rightarg's. With most of the calls op and rightarg are the same, but
> not all of them. The function shouldn't make that assumption even if
> the args op was always the same as rightarg.

Rearranged the code in partition_cmp_args() a bit and added more
clarifying comments.

> 8. remove_redundant_clauses() needs an overview comment of what the
> function does.


> 9. The comment should explain what we would do in the case of key < 3
> AND key <= 2 using some examples.
> /* try to keep only one of <, <= */


> 10. Wondering why this loop runs backward?
> for (s = BTMaxStrategyNumber; --s >= 0;)
> Why not just:
> for (s = 0; s < BTMaxStrategyNumber; s++)
> I can't see a special reason for it to run backward. It seems unusual,
> but if there's a good reason that I've failed to realise then it's
> maybe worth a comment.

Hmm, no special reason. So, done the other way. I actually brought this
redundant key logic elimination logic over from nbtutils.c:
_bt_preprocess_keys() and the loop runs that way over there.

> 11. Pleae comment on why *constfalse = true is set here:
> if (!chk || s == (BTEqualStrategyNumber - 1))
> continue;
> if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
> &test_result))
> {
> if (!test_result)
> {
> *constfalse = true;
> return;
> }
> /* discard the redundant key. */
> xform[s] = NULL;
> }
> Looks like we'd hit this in a case such as: WHERE key = 1 AND key > 1.

Right. Added a comment with example.

> Also please add a comment when discarding the redundant key maybe
> explain that equality is more useful than the other strategies when
> there's an overlap.

Done, too.

Please find attached updated patch set. There are significant changes in
this version as described below, including the support for hash
partitioned tables.

Earlier today, I reported [1] what looks to me like a bug in how default
partition's constraint gets generated and how that sometimes makes
constraint exclusion mistakenly prune a default partition. I have
included the patches I posted there in this series too. They are ahead in
the list.

So attached patches are now as follows:


Patches at [1].


Tests. Mostly unchanged from the last version.


Uses the bms_add_range_v2 patch.


Fixed some issues with how OR clauses were matched and the code for
checking if the operator is partitioning compatible is no longer in the
planner code, instead it's now only in partition.c. Other cosmetic
improvements including those that resulted from the review comments.


Several bug fixes and many cosmetic improvements including improved
commentary. Per Amul's comment upthread, the patch no longer depends on
using value -1 to denote an invalid value of NulltestType enum. In fact,
it doesn't use NulltestType type variables at all.


No changes.


Support for hash partitioning and tests for the same. Also, since
update/delete on partitioned tables still depend on constraint exclusion
for pruning, fix things such that get_relation_constraints includes
partition constraints in its result only for non-select queries (for
selects we have the new pruning code). Other bug fixes.



Attachment Content-Type Size
0001-Add-default-partition-case-in-inheritance-testing.patch text/plain 7.3 KB
0002-Tweak-default-range-partition-s-constraint-a-little.patch text/plain 4.5 KB
0003-Add-new-tests-for-partition-pruning-v12.patch text/plain 50.7 KB
0004-Add-a-bms_add_range-v12.patch text/plain 3.3 KB
0005-Planner-side-changes-for-partition-pruning-v12.patch text/plain 39.4 KB
0006-Implement-get_partitions_from_clauses-v12.patch text/plain 40.6 KB
0007-Some-interface-changes-for-partition_bound_-cmp-bsea-v12.patch text/plain 11.5 KB
0008-Implement-get_partitions_for_keys-v12.patch text/plain 39.2 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Devrim Gündüz 2017-11-17 10:54:19 Re: pspg - psql pager
Previous Message Masahiko Sawada 2017-11-17 09:34:29 Missing wal_receiver_status_interval in Subscribers section