Re: [HACKERS] path toward faster partition pruning

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2018-03-14 11:50:29
Message-ID: CAKJS1f_Vs6tmDjx=TiL9MKdDy0xrh-Qotd6_ZHYA8fa92qahpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14 March 2018 at 00:37, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Attached is a significantly revised version of the patch, although I admit
> it could still use some work with regard to comments and other cleanup.

Thanks for making all those changes. There's been quite a bit of churn!

I've looked over the patch and agree that there need to be more
comments to explain things. There were certainly times during my
review that I just skipped ahead having not understood what I'd been
looking at.

Here are the notes from my review:

1. I think get_unpruned_partitions is not a great name for this
function. get_matching_partitions would be better.

2. HASH partitioning can no longer prune non-matching partitions in
cases like: SELECT * FROM hashp WHERE partkey1 IS NULL and partkey2 IS
NULL; This might mean you need to process IS NULL and the key clauses
in the same step.

I see you have a test which checks the plan for

explain (costs off) select * from hp where a = 1 and b is null;

which ensures all partitions are included. Why?

3. Header comment for get_partitions_for_keys references 'keys' which
is not a parameter to that function.

4. If you wrote a function to process an individual step and called
that in a foreach loop inside get_unpruned_partitions, then you
wouldn't need to list_make1() here. Instead just call the function to
perform one step.

argparts = get_unpruned_partitions(context,
list_make1(step));

5. I see you're performing AND step combination in two different ways.
In perform_pruning_combine_step you do;

andparts = andparts == NULL
? argparts
: bms_int_members(andparts, argparts);

but in get_unpruned_partitions, you add the entire range to the set
using bms_add_range, then intersect on that.

The code seems a bit fragile and relies on get_unpruned_partitions
returning an allocated Bitmapset all the time, even if bms_is_empty()
is true. There should be no distinction between NULL and a Bitmapset
that returns true on bms_is_empty().

What you've got here probably works for now, but only due to the fact
that get_unpruned_partitions allocate the entire range with
bms_add_range and that bms_int_members does not return NULL with no
matching members if both input sets are non-NULL. If that were to
change then your code would misbehave in cases like:

WHERE <matches no partition> AND <matches a partition>;

When processing the <matches no partition> clause, no partitions would
match, then if that resulted in an empty set you'd then surprisingly
match partitions again despite the AND clause not actually making it
possible for any partitions to have matched.

Probably you just need to bms_add_range in
perform_pruning_combine_step too and perform bms_int_members
unconditionally, just like you're doing in get_unpruned_partitions

6. The switch (last->op_strategy) in
generate_partition_pruning_steps_internal is missing a default ERROR
for unknown strategy

7. The switch: switch (context->strategy) in
generate_partition_pruning_steps_internal should ERROR rather than
break when it sees an unknown partition strategy.

8. Instead of copying the opclause here, wouldn't you be better just
this code come after you palloc the PartClause then just setup the
PartClause with the negator directly?

if (*is_neop_listp)
{
Assert(OidIsValid(negator));
opclause = copyObject(opclause);
opclause->opno = negator;
}

8. PartitionedChildRelInfo is still mentioned in typedefs.list

9. I don't quite understand PartitionPruneStepNoop. Why can't you just
skip adding anything to the list in
generate_partition_pruning_steps_internal?

10. The following test does not make sense:

explain (costs off) select * from hp where (a = 10 and b = 'yyy') or
(a = 10 and b = 'xxx') or (a is null and b is null);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Append
-> Seq Scan on hp0
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp1
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp2
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp3
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
(9 rows)

Why do 4 partitions match when there are only 3 sets of clauses when
each one can only match a single partition?

11. What does "root" mean here?

-- case for list partitioned table that's not root
explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b
<> 'cd' and b <> 'xy' and b is not null;

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ildus Kurbangaliev 2018-03-14 12:05:16 Re: committing inside cursor loop
Previous Message Simon Riggs 2018-03-14 11:23:29 Re: Faster inserts with mostly-monotonically increasing values