Re: path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Beena Emerson <memissemerson(at)gmail(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: path toward faster partition pruning
Date: 2017-10-26 11:17:47
Message-ID: 0d6096e8-7c7b-afed-71d3-dca151306626@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Beena,

Thanks for the tests.

On 2017/10/25 18:18, Beena Emerson wrote:
> The crashes are fixed. However, handling of DEFAULT partition in
> various queries is not proper.
>
> Case 1: In this case default should be selected.
> DROP TABLE tprt;
> CREATE TABLE tprt (col1 int, col2 int) PARTITION BY range(col1);
> CREATE TABLE tprt_1 PARTITION OF tprt FOR VALUES FROM (1) TO (50001)
> PARTITION BY list(col1);
> CREATE TABLE tprt_11 PARTITION OF tprt_1 FOR VALUES IN (20000, 25000);
> CREATE TABLE tprt_12 PARTITION OF tprt_1 FOR VALUES IN (50000, 35000);
> CREATE TABLE tprt_13 PARTITION OF tprt_1 FOR VALUES IN (10000);
> CREATE TABLE tprt_1d PARTITION OF tprt_1 DEFAULT;
>
>
> postgres=# EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 < 10000;
> QUERY PLAN
> --------------------------
> Result
> One-Time Filter: false
> (2 rows)

Hmm, this clearly looks wrong. Fixed in the attached.

> Case 2: In this case DEFAULT need not be selected.
>
> DROP TABLE tprt;
> CREATE TABLE tprt (col1 int, col2 int) PARTITION BY range(col1);
> CREATE TABLE tprt_1 PARTITION OF tprt FOR VALUES FROM (1) TO (50001)
> PARTITION BY range(col1);
> CREATE TABLE tprt_11 PARTITION OF tprt_1 FOR VALUES FROM (1) TO (10000);
> CREATE TABLE tprt_12 PARTITION OF tprt_1 FOR VALUES FROM (10000) TO (20000);
> CREATE TABLE tprt_13 PARTITION OF tprt_1 FOR VALUES FROM (20000) TO (30000);
> CREATE TABLE tprt_1d PARTITION OF tprt_1 DEFAULT;
> INSERT INTO tprt SELECT generate_series(1,50000), generate_series(1,50000);
>
> postgres=# EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 < 10000;
> QUERY PLAN
> --------------------------------
> Append
> -> Seq Scan on tprt_11
> Filter: (col1 < 10000)
> -> Seq Scan on tprt_1d
> Filter: (col1 < 10000)
> (5 rows)

Yeah, ideally. But it's kind of hard to for the new partition-pruning
algorithm to be *that* correct in this particular case involving default
partitions. Let me try to explain why I think it may be a bit hard to
implement.

I perhaps have mentioned before that the new partition-pruning algorithm
runs for every partitioned table in the tree separately. In this example,
it will first determine for the root table tprt that only the partition
tprt_1 needs to be scanned. Since tprt_1 is itself partitioned, algorithm
will run again, but the fact that tprt_1 (iow, any of its partitions) is
itself constrained to range [1, 50001) is, for the most part, lost on the
algorithm. Note that non-default partitions (tprt_11, tprt_12, ...) have
bound datums in PartitionBoundInfo describing the range of data they
contain, which the algorithm uses to determine the set of partitions
satisfying given set of clauses. The default partition has no datums.
The only thing describing what it contains is its partition constraint.
From the clause col1 < 10000, the algorithm will conclude that the default
partition might contain some data satisfying the same, because it knows
for sure that there no non-default partition for keys < 1.

It can perhaps taught to not make that conclusion by taking into account
the default partition's partition constraint, which includes constraint
inherited from the parent, viz. 1 <= col1 < 50001. To do that, it might
be possible to summon up predtest.c's powers to conclude from the default
partition's partition constraint that it cannot contain any keys < 1, but
then we'll have to frame up a clause expression describing the latter.
Generating such a clause expression can be a bit daunting for a
multi-column key. So, I haven't yet tried really hard to implement this.
Any thoughts on that?

Meanwhile, attached updated set of patches including fixes for the typos
you reported in the other message. Updated 0005 fixes the first bug (the
Case 1 in your email), while other patches 0002-0004 are updated mostly to
fix the reported typos. A couple of tests are added in 0001 to test the
default partition case a bit more.

Thanks,
Amit

Attachment Content-Type Size
0001-Add-new-tests-for-partition-pruning-v5.patch text/plain 41.0 KB
0002-Planner-side-changes-for-partition-pruning-v5.patch text/plain 41.0 KB
0003-Implement-get_partitions_from_clauses-v5.patch text/plain 31.4 KB
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v5.patch text/plain 10.1 KB
0005-Implement-get_partitions_for_keys-v5.patch text/plain 20.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rushabh Lathia 2017-10-26 11:22:16 Re: Parallel tuplesort (for parallel B-Tree index creation)
Previous Message David Rowley 2017-10-26 11:17:00 Re: Removing [Merge]Append nodes which contain a single subpath