Re: [HACKERS] path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2018-03-16 12:55:59
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi David.

Thanks for the review.

On 2018/03/14 20:50, David Rowley wrote:
> On 14 March 2018 at 00:37, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Attached is a significantly revised version of the patch, although I admit
>> it could still use some work with regard to comments and other cleanup.
> Thanks for making all those changes. There's been quite a bit of churn!
> I've looked over the patch and agree that there need to be more
> comments to explain things. There were certainly times during my
> review that I just skipped ahead having not understood what I'd been
> looking at.

Hope the attached version is easier to understand.

> Here are the notes from my review:
> 1. I think get_unpruned_partitions is not a great name for this
> function. get_matching_partitions would be better.

OK, I changed it to get_matching_partitions.

> 2. HASH partitioning can no longer prune non-matching partitions in
> cases like: SELECT * FROM hashp WHERE partkey1 IS NULL and partkey2 IS
> NULL; This might mean you need to process IS NULL and the key clauses
> in the same step.

OK, I made the PartitionPruneStepOp to contain a bitmap of null key
indexes. It is referred to along with "values" for other columns only in
the hash partitioning case, whereas for list and range partitioning,
actual value(s) and null key information are passed via separate
PartitionPruneStepOp nodes, because in case of the latter we cannot mix
nulls and "values".

So, now there is no need for PartitionPruneStepNullness and

> I see you have a test which checks the plan for
> explain (costs off) select * from hp where a = 1 and b is null;
> which ensures all partitions are included. Why?

I had intended to remove support for hash partition pruning with some keys
being null that existed in the previous versions of the patch, but thought
it was too pessimistic after reading your comment. I've added it back and
it works like it used to in the previous versions.

> 3. Header comment for get_partitions_for_keys references 'keys' which
> is not a parameter to that function.

Oops, fixed.

> 4. If you wrote a function to process an individual step and called
> that in a foreach loop inside get_unpruned_partitions, then you
> wouldn't need to list_make1() here. Instead just call the function to
> perform one step.
> argparts = get_unpruned_partitions(context,
> list_make1(step));

Hmm, yes. I've introduced a perform_pruning_step() that switches on the
pruning step node type, that is called by either get_matching_partitions
while iterating the list it receives or by perform_pruning_combine_step()
calls on the arguments of a PartitionPruneStepCombine.

> 5. I see you're performing AND step combination in two different ways.
> In perform_pruning_combine_step you do;
> andparts = andparts == NULL
> ? argparts
> : bms_int_members(andparts, argparts);
> but in get_unpruned_partitions, you add the entire range to the set
> using bms_add_range, then intersect on that.
> The code seems a bit fragile and relies on get_unpruned_partitions
> returning an allocated Bitmapset all the time, even if bms_is_empty()
> is true. There should be no distinction between NULL and a Bitmapset
> that returns true on bms_is_empty().
> What you've got here probably works for now, but only due to the fact
> that get_unpruned_partitions allocate the entire range with
> bms_add_range and that bms_int_members does not return NULL with no
> matching members if both input sets are non-NULL. If that were to
> change then your code would misbehave in cases like:
> WHERE <matches no partition> AND <matches a partition>;
> When processing the <matches no partition> clause, no partitions would
> match, then if that resulted in an empty set you'd then surprisingly
> match partitions again despite the AND clause not actually making it
> possible for any partitions to have matched.
> Probably you just need to bms_add_range in
> perform_pruning_combine_step too and perform bms_int_members
> unconditionally, just like you're doing in get_unpruned_partitions

I noticed a number of things I could improve about this, also considering
your points above. Please check if the new structure is an improvement.

> 6. The switch (last->op_strategy) in
> generate_partition_pruning_steps_internal is missing a default ERROR
> for unknown strategy

I've fixed that.

> 7. The switch: switch (context->strategy) in
> generate_partition_pruning_steps_internal should ERROR rather than
> break when it sees an unknown partition strategy.

This one too.

> 8. Instead of copying the opclause here, wouldn't you be better just
> this code come after you palloc the PartClause then just setup the
> PartClause with the negator directly?
> if (*is_neop_listp)
> {
> Assert(OidIsValid(negator));
> opclause = copyObject(opclause);
> opclause->opno = negator;
> }

Agreed, done.

> 8. PartitionedChildRelInfo is still mentioned in typedefs.list


> 9. I don't quite understand PartitionPruneStepNoop. Why can't you just
> skip adding anything to the list in
> generate_partition_pruning_steps_internal?
> 10. The following test does not make sense:
> explain (costs off) select * from hp where (a = 10 and b = 'yyy') or
> (a = 10 and b = 'xxx') or (a is null and b is null);
> -------------------------------------------------------------------------------------------------------------------------
> Append
> -> Seq Scan on hp0
> Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
> = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
> -> Seq Scan on hp1
> Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
> = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
> -> Seq Scan on hp2
> Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
> = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
> -> Seq Scan on hp3
> Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
> = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
> (9 rows)
> Why do 4 partitions match when there are only 3 sets of clauses when
> each one can only match a single partition?

After bringing the support for hash partition pruning even with IS NULL
clauses back, this works as you'd expect.

> 11. What does "root" mean here?
> -- case for list partitioned table that's not root
> explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b
> <> 'cd' and b <> 'xy' and b is not null;

It means a partitioned table that is not the root partitioned table.
Those <> clauses prune but not at the root level, only after recursing for
a list partitioned child of rlp.

Attached updated patches.


Attachment Content-Type Size
v37-0001-Add-partsupfunc-to-PartitionSchemeData.patch text/plain 3.4 KB
v37-0002-Add-more-tests-for-partition-pruning.patch text/plain 16.3 KB
v37-0003-Faster-partition-pruning.patch text/plain 99.9 KB
v37-0004-Add-only-unpruned-partitioned-child-rels-to-part.patch text/plain 24.2 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2018-03-16 13:44:31 Re: PATCH: Configurable file mode mask
Previous Message Tomas Vondra 2018-03-16 12:55:11 Re: [HACKERS] [PATCH] Incremental sort