Re: [HACKERS] path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2018-04-04 05:42:58
Message-ID: 0c7f6f84-9b46-66ae-4b1b-4cc30dfe2f14@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David.

On 2018/04/04 10:13, David Rowley wrote:
> On 4 April 2018 at 11:22, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> On 4 April 2018 at 09:47, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>>> I think it would be better to just have special handling in
>>> get_matching_list_bound so that it knows it's performing <>
>>> elimination. I'd thought about passing some other opstrategy but the
>>> only safe one I thought to use was InvalidStrategy, which is already
>>> used by NULL handling.

Thanks for this suggestion.

Having the special case handling for steps corresponding to <> operator
clauses in get_matching_list_bounds() seems like the best way and that
should have been the way all along. It occurred to me that after I
changed the patch to store datum offsets in the result, there wasn't any
need for special handling of <> operators at a higher level -- like the
special pruning function (get_partitions_excluded_by_ne_datums) that we
used to have or the COMBINE_INVERT I recently proposed.

For each datum coming from a <> operator clause (signaled to
get_matching_list_bounds by passing InvalidStrategy for opstrategy), we
return all datums minus the one that was passed (if the latter is indeed
found in boundinfo). Bounds for individual <> operator clauses will be
combined using INTERSECT at a higher level to give the desired result. No
need for the invert step and for the planner to set things up very
carefully for invert step to do the right thing.

create table lp (a int) partition by list (a);
create table lp1 partition of lp for values in (1);
create table lp2 partition of lp for values in (2);
create table lp3 partition of lp for values in (3);
create table lp_null partition of lp for values in (null);
create table lp_default partition of lp default;

For

explain select * from lp where a <> 1

get_matching_list_bounds will returns the set of offsets of {2, 3} and
will set scan_default, while setting scan_null false;

and for

explain select * from lp where a <> 1 and a <> 3

it will returns the set of offsets of {2, 3} and {1, 2} for the individual
base steps and along setting scan_default and setting scan_null to false;
the INTERSECT combination step still combine those to give the offset of 2
with scan_default set to true and scan_null set to false.

QUERY PLAN
--------------------------------------------------------------------
Append (cost=0.00..121.75 rows=5050 width=4)
-> Seq Scan on lp2 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 3))
-> Seq Scan on lp_default (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 3))

If there are no values then the offsets of all the bounds will be returned
and unlike the previous setup with the INVERT step in the mix, they will
all survive.

I know we've been back and forth quite a bit on this, but this solution
seems like the one with the least amount of hackery. Hope you find it to
be the same way.

>> I'm currently working up a patch to do this the way I think is best.
>>
>> I'll submit it soon and we can review and get your thoughts on it.
>
> I've attached a rough cut version of what I think is a good solution
> for this. It's based on v46, not your latest v47, sorry.
>
> This makes get_matching_list_bounds() aware that it's performing the
> not-equal pruning via the opstrategy which allows it to not return all
> partitions when there are no values in this case. Instead, we return
> the NULL partition, so that we later invert that and return everything
> apart from the NULL partition. A strict clause will allow us that
> much, even if we can't get the actual value being compared to, at the
> time.

As I explained above, I considered your general idea of teaching
get_matching_list_bounds to deal with being passed InvalidStrategy for
opstrategy to signal special handling of a <> clause datum.

By implementing that, I was able to get rid of a bunch of code in
partprune.c and remove the COMBINE_INVERT related code. We can add
COMBINE_INVERT later if and when we need it (for some legitimate purpose).

> There's also a bunch of other changes in there:

Thanks.

> 1. Adding missing step_id in copyfuncs.c

Merged.

> 2. Simplified including the default partition in a bunch of cases.
> 3. Made it so scan_default and scan_null are only ever set to true if
> there's a partition for that.

I have merged these too.

> 4. Changed get_matching_*_bounds to return the entire result struct
> instead of the Bitmapset and pass the remaining bool values back
> through params. I didn't really like how you'd change this to pass all
> the bool flags back as params. There's a perfectly good struct there
> to provide the entire result in a single return value. I know you've
> disagreed with this already, so would be nice to get a 3rd opinion.

I went ahead with them returning PruneStepResult struct.

> 5. Rename get_matching_hash_bound to get_matching_hash_bounds. The
> LIST and RANGE version of this function both had a plural name. I
> didn't see any reason for the hash case to be different.

Agreed, merged.

> Let me know what you think.

I'm not sure about the following change in your patch:

- if (!result->scan_null)
- result->scan_null = step_result->scan_null;
- if (!result->scan_default)
- result->scan_default = step_result->scan_default;
+ result->scan_null |= step_result->scan_null;
+ result->scan_default |= step_result->scan_default;

Afaik, |= does bitwise OR, which even if it might give the result we want,
is not a logical operation. I had written the original code using the
following definition of logical OR.

a OR b = if a then true else b

Also, since things work normally even if we pass no values to
get_matching_list_bounds, including via the dummy step generated for IS
NOT NULL clause(s), I don't see the need to store notnullkeys in the prune
step. Especially, it's redundant to set notnullkeys in the pruning step
containing non-empty exprs since, by definition, they will select
partitions containing non-null datums.

> I've patched the run-time pruning v18 against this and it now passes regression.
>
> I need to do a bit more testing on this to ensure it works for all
> cases, but thought I'd send now as I suspect you're currently around
> to look.

See if attached works for you.

> There might be another issue with the patch too, but I'll send a
> separate email about that.

I suppose this is the email about support for pruning using NOT clauses in
partition.c. It might be possible to do that by tweaking things somehow
by re-introducing the COMBINE_INVERT step (legitimately needed in that
case) and modifying partprune.c to capture NOT clauses in more cases than
it does currently.

Although, I modified the comment like you suggested that we only support
Boolean NOT clause in special cases like when using Boolean partitioning
opfamily.

Attached v48.

Thanks,
Amit

Attachment Content-Type Size
v48-0001-Add-partsupfunc-to-PartitionSchemeData.patch text/plain 3.4 KB
v48-0002-Add-more-tests-for-partition-pruning.patch text/plain 16.7 KB
v48-0003-Faster-partition-pruning.patch text/plain 123.0 KB
v48-0004-Add-only-unpruned-partitioned-child-rels-to-part.patch text/plain 24.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-04-04 06:00:21 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Thomas Munro 2018-04-04 05:29:28 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS