Re: [Sender Address Forgery]Re: path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Sender Address Forgery]Re: path toward faster partition pruning
Date: 2017-11-06 04:30:02
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi David,

On 2017/11/03 13:32, David Rowley wrote:
> On 31 October 2017 at 21:43, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Attached updated version of the patches addressing some of your comments
> I've spent a bit of time looking at these but I'm out of time for now.

Thanks a lot for the review and sorry for the delay in sending rebased

> So far I have noted down the following;
> 1. This comment seem wrong.
> /*
> * Since the clauses in rel->baserestrictinfo should all contain Const
> * operands, it should be possible to prune partitions right away.
> */

Yes. I used to think it was true, then realized it isn't and updated the
code to get rid of that assumption, but I forgot updating this comment.

> How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ?
> baserestrictinfo in this case will contain a single RestrictInfo with
> an OpExpr containing two Var args and it'll come right through that
> function too.

As it should I think. Quite similarly, you will be able see that index
path won't be considered for such a clause:

create table foo (a int, b int);
create index fooi on foo (a);
insert into foo select generate_series(1, 100000);

explain select * from foo where a = 1;
Bitmap Heap Scan on foo (cost=12.17..482.50 rows=500 width=8)
Recheck Cond: (a = 1)
-> Bitmap Index Scan on fooi (cost=0.00..12.04 rows=500 width=0)
Index Cond: (a = 1)
(4 rows)

explain select * from foo where a <= b;
Seq Scan on foo (cost=0.00..1693.00 rows=500 width=8)
Filter: (a = b)
(2 rows)

We won't be able to use such a clause for pruning at all; neither
planning-time pruning nor execution-time pruning. Am I missing something?

> 2. This code is way more complex than it needs to be.
> if (num_parts > 0)
> {
> int j;
> all_indexes = (int *) palloc(num_parts * sizeof(int));
> j = 0;
> if (min_part_idx >= 0 && max_part_idx >= 0)
> {
> for (i = min_part_idx; i <= max_part_idx; i++)
> all_indexes[j++] = i;
> }
> if (!bms_is_empty(other_parts))
> while ((i = bms_first_member(other_parts)) >= 0)
> all_indexes[j++] = i;
> if (j > 1)
> qsort((void *) all_indexes, j, sizeof(int), intcmp);
> }
> It looks like the min/max partition stuff is just complicating things
> here. If you need to build this array of all_indexes[] anyway, I don't
> quite understand the point of the min/max. It seems like min/max would
> probably work a bit nicer if you didn't need the other_parts
> BitmapSet, so I recommend just getting rid of min/max completely and
> just have a BitmapSet with bit set for each partition's index you
> need, you'd not need to go to the trouble of performing a qsort on an
> array and you could get rid of quite a chunk of code too.
> The entire function would then not be much more complex than:
> partindexes = get_partitions_from_clauses(parent, partclauses);
> while ((i = bms_first_member(partindexes)) >= 0)
> {
> AppendRelInfo *appinfo = rel->part_appinfos[i];
> result = lappend(result, appinfo);
> }
> Then you can also get rid of your intcmp() function too.

The design with min/max partition index interface to the partition.c's new
partition-pruning facility is intentional. You can find hints about how
such a design came about in the following Robert's email:

For range queries, it is desirable for the partitioning module to return
the set of qualifying partitions that are contiguous in a compact (O(1))
min/max representation than having to enumerate all those indexes in the
set. It's nice to avoid iterating over that set twice -- once when
constructing the set in the partitioning module and then again in the
caller (in this case, planner) to perform the planning-related tasks per
selected partition.

We need the other_parts Bitmapset too, because selected partitions may not
always be contiguous, even in the case of range partitioning, if there are
OR clauses and the possibility that the default partition is also
selected. While computing the selected partition set from a given set of
clauses, partitioning code tries to keep the min/max representation as
long as it makes sense to and once the selected partitions no longer
appear to be contiguous it switches to the Bitmapset mode.

> 3. Following code has the wrong size calculation:
> memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType *));
> should be PARTITION_MAX_KEYS * sizeof(NullTestType). It might have
> worked on your machine if you're compiling as 32 bit.

Oops, you're right. Fixed.

> I'll continue on with the review in the next few days.

Thanks again.

Attached is the updated set of patches.


Attachment Content-Type Size
0001-Add-new-tests-for-partition-pruning-v10.patch text/plain 47.2 KB
0002-Planner-side-changes-for-partition-pruning-v10.patch text/plain 38.4 KB
0003-Implement-get_partitions_from_clauses-v10.patch text/plain 33.5 KB
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v10.patch text/plain 10.1 KB
0005-Tweak-default-range-partition-s-constraint-a-little-v10.patch text/plain 2.9 KB
0006-Implement-get_partitions_for_keys-v10.patch text/plain 21.4 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-11-06 04:31:59 Re: [PATCH] Overestimated filter cost and its mitigation
Previous Message Edmund Horner 2017-11-06 04:28:55 PATCH: psql tab completion for SELECT