Re: path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: path toward faster partition pruning
Date: 2017-11-06 10:01:42
Message-ID: 62d21a7b-fea9-f2d7-c33a-8caa12eca612@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 2017/11/06 14:32, David Rowley wrote:
> On 6 November 2017 at 17:30, Amit Langote wrote:
>> On 2017/11/03 13:32, David Rowley wrote:
>>> On 31 October 2017 at 21:43, Amit Langote wrote:
>>> 1. This comment seem wrong.
>>>
>>> /*
>>> * Since the clauses in rel->baserestrictinfo should all contain Const
>>> * operands, it should be possible to prune partitions right away.
>>> */
>>
>> Yes. I used to think it was true, then realized it isn't and updated the
>> code to get rid of that assumption, but I forgot updating this comment.
>> Fixed.
>>
>>> How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ?
>>> baserestrictinfo in this case will contain a single RestrictInfo with
>>> an OpExpr containing two Var args and it'll come right through that
>>> function too.
>
> ...
>
>> We won't be able to use such a clause for pruning at all; neither
>> planning-time pruning nor execution-time pruning. Am I missing something?
>
> I just meant the comment was wrong.

Ah, gotcha.

>> The design with min/max partition index interface to the partition.c's new
>> partition-pruning facility is intentional. You can find hints about how
>> such a design came about in the following Robert's email:
>>
>> https://www.postgresql.org/message-id/CA%2BTgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw%40mail.gmail.com
>
> Yeah, I remember reading that before I had looked at the code. I
> disagree with Robert on this. The fact that the min/max range gets
> turned into a list of everything in that range in
> get_append_rel_partitions means all the advantages that storing the
> partitions as a range is voided. If you could have kept it a range the
> entire time, then that might be different, but seems you need to
> materialize the entire range in order to sort the partitions into
> order. I've included Robert in just in case he wants to take a look at
> the code that resulted from that design. Maybe something is not
> following what he had in mind, or maybe he'll change his mind based on
> the code that resulted.
>
>> For range queries, it is desirable for the partitioning module to return
>> the set of qualifying partitions that are contiguous in a compact (O(1))
>> min/max representation than having to enumerate all those indexes in the
>> set. It's nice to avoid iterating over that set twice -- once when
>> constructing the set in the partitioning module and then again in the
>> caller (in this case, planner) to perform the planning-related tasks per
>> selected partition.
>
> The idea is that you still get the min and max from the bsearch, but
> then use bms_add_range() to populate a bitmapset of the matching
> partitions. The big-O notation of the search shouldn't change.
>
>> We need the other_parts Bitmapset too, because selected partitions may not
>> always be contiguous, even in the case of range partitioning, if there are
>> OR clauses and the possibility that the default partition is also
>> selected. While computing the selected partition set from a given set of
>> clauses, partitioning code tries to keep the min/max representation as
>> long as it makes sense to and once the selected partitions no longer
>> appear to be contiguous it switches to the Bitmapset mode.
>
> Yip. I understand that. I just think there's no benefit to having
> min/max since it needs to be materialized into a list of the entire
> range at some point, it might as well be done as soon as possible
> using a bitmapset, which would save having all the partset_union,
> partset_intersect, partset_range_empty, partset_range_overlap,
> partset_range_adjacent code. You'd end up just using bms_union and
> bms_intersect then bms_add_range to handle the min/max bound you get
> from the bsearch.

OK, I have gotten rid of the min/max partition index interface and instead
adopted the bms_add_range() approach by including your patch to add the
same in the patch set (which is now 0002 in the whole set). I have to
admit that it's simpler to understand the new code with just Bitmapsets to
look at, but I'm still a bit concerned about materializing the whole set
right within partition.c, although we can perhaps optimize it later.

Attached updated set of patches, including the fix to make the new pruning
code handle Boolean partitioning.

Thanks,
Amit

Attachment Content-Type Size
0001-Add-new-tests-for-partition-pruning-v11.patch text/plain 50.7 KB
0002-Add-bms_add_range-to-add-members-within-the-specifie-v11.patch text/plain 3.5 KB
0003-Planner-side-changes-for-partition-pruning-v11.patch text/plain 39.1 KB
0004-Implement-get_partitions_from_clauses-v11.patch text/plain 29.3 KB
0005-Some-interface-changes-for-partition_bound_-cmp-bsea-v11.patch text/plain 10.1 KB
0006-Tweak-default-range-partition-s-constraint-a-little-v11.patch text/plain 2.9 KB
0007-Implement-get_partitions_for_keys-v11.patch text/plain 23.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-11-06 10:19:32 Re: [PATCH] Overestimated filter cost and its mitigation
Previous Message Amit Langote 2017-11-06 09:59:46 Re: [Sender Address Forgery]Re: path toward faster partition pruning