Re: Declarative partitioning - another take

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning - another take
Date: 2016-09-15 12:58:05
Message-ID: CAFjFpReZF34MDbY95xoATi0xVj2mAry4-LHBWVBayOc8gj=iqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit,

It looks like there is some problem while creating paramterized paths
for multi-level partitioned tables. Here's a longish testcase

CREATE TABLE prt1_l (a int, b int, c varchar) PARTITION BY RANGE(a);
CREATE TABLE prt1_l_p1 PARTITION OF prt1_l FOR VALUES START (0) END
(250) PARTITION BY RANGE (b);
CREATE TABLE prt1_l_p1_p1 PARTITION OF prt1_l_p1 FOR VALUES START (0) END (100);
CREATE TABLE prt1_l_p1_p2 PARTITION OF prt1_l_p1 FOR VALUES START
(100) END (250);
CREATE TABLE prt1_l_p2 PARTITION OF prt1_l FOR VALUES START (250) END
(500) PARTITION BY RANGE (c);
CREATE TABLE prt1_l_p2_p1 PARTITION OF prt1_l_p2 FOR VALUES START
('0250') END ('0400');
CREATE TABLE prt1_l_p2_p2 PARTITION OF prt1_l_p2 FOR VALUES START
('0400') END ('0500');
CREATE TABLE prt1_l_p3 PARTITION OF prt1_l FOR VALUES START (500) END
(600) PARTITION BY RANGE ((b + a));
CREATE TABLE prt1_l_p3_p1 PARTITION OF prt1_l_p3 FOR VALUES START
(1000) END (1100);
CREATE TABLE prt1_l_p3_p2 PARTITION OF prt1_l_p3 FOR VALUES START
(1100) END (1200);
INSERT INTO prt1_l SELECT i, i, to_char(i, 'FM0000') FROM
generate_series(0, 599, 2) i;
CREATE TABLE uprt1_l AS SELECT * FROM prt1_l;

CREATE TABLE prt2_l (a int, b int, c varchar) PARTITION BY RANGE(b);
CREATE TABLE prt2_l_p1 PARTITION OF prt2_l FOR VALUES START (0) END
(250) PARTITION BY RANGE (a);
CREATE TABLE prt2_l_p1_p1 PARTITION OF prt2_l_p1 FOR VALUES START (0) END (100);
CREATE TABLE prt2_l_p1_p2 PARTITION OF prt2_l_p1 FOR VALUES START
(100) END (250);
CREATE TABLE prt2_l_p2 PARTITION OF prt2_l FOR VALUES START (250) END
(500) PARTITION BY RANGE (c);
CREATE TABLE prt2_l_p2_p1 PARTITION OF prt2_l_p2 FOR VALUES START
('0250') END ('0400');
CREATE TABLE prt2_l_p2_p2 PARTITION OF prt2_l_p2 FOR VALUES START
('0400') END ('0500');
CREATE TABLE prt2_l_p3 PARTITION OF prt2_l FOR VALUES START (500) END
(600) PARTITION BY RANGE ((a + b));
CREATE TABLE prt2_l_p3_p1 PARTITION OF prt2_l_p3 FOR VALUES START
(1000) END (1100);
CREATE TABLE prt2_l_p3_p2 PARTITION OF prt2_l_p3 FOR VALUES START
(1100) END (1200);
INSERT INTO prt2_l SELECT i, i, to_char(i, 'FM0000') FROM
generate_series(0, 599, 3) i;
CREATE TABLE uprt2_l AS SELECT * FROM prt2_l;

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
(SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.a AS
t3a, least(t1.a,t2.a,t3.a) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a =
t3.b AND t2.b = t3.a AND t2.c = t3.c AND t2.b + t2.a = t3.a + t3.b))
ss
ON t1.a = ss.t2a AND t1.b = ss.t2a AND t1.c = ss.t2c AND
t1.b + t1.a = ss.t2a + ss.t2b WHERE t1.a % 25 = 0 ORDER BY t1.a;
ERROR: could not devise a query plan for the given query

Let's replace the laterally referenced relation by an unpartitioned table.

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM uprt1_l t1 LEFT JOIN LATERAL
(SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.a AS
t3a, least(t1.a,t2.a,t3.a) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a =
t3.b AND t2.b = t3.a AND t2.c = t3.c AND t2.b + t2.a = t3.a + t3.b))
ss
ON t1.a = ss.t2a AND t1.b = ss.t2a AND t1.c = ss.t2c AND
t1.b + t1.a = ss.t2a + ss.t2b WHERE t1.a % 25 = 0 ORDER BY t1.a;
ERROR: could not devise a query plan for the given query

Let's replace another partitioned table in the inner query with an
unpartitioned table.

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM uprt1_l t1 LEFT JOIN LATERAL
(SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.a AS
t3a, least(t1.a,t2.a,t3.a) FROM prt1_l t2 JOIN uprt2_l t3 ON (t2.a =
t3.b AND t2.b = t3.a AND t2.c = t3.c AND t2.b + t2.a = t3.a + t3.b))
ss
ON t1.a = ss.t2a AND t1.b = ss.t2a AND t1.c = ss.t2c AND
t1.b + t1.a = ss.t2a + ss.t2b WHERE t1.a % 25 = 0 ORDER BY t1.a;
ERROR: could not devise a query plan for the given query

Please check if you are able to reproduce these errors in your
repository. I made sure that I cleaned up all partition-wise join code
before testing this, but ... .

I tried to debug the problem somewhat. In set_append_rel_pathlist(),
it finds that at least one child has a parameterized path as the
cheapest path, so it doesn't create an unparameterized path for append
rel. At the same time there is a parameterization common to all the
children, so it doesn't create any path. There seem to be two problems
here
1. The children from second level onwards may not be getting
parameterized for lateral references. That seems unlikely but
possible.
2. Reparameterization should have corrected this, but
reparameterize_path() does not support AppendPaths.

On Thu, Sep 15, 2016 at 2:23 PM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Hi
>
> On 2016/09/09 17:55, Amit Langote wrote:
>> On 2016/09/06 22:04, Amit Langote wrote:
>>> Will fix.
>>
>> Here is an updated set of patches.
>
> An email from Rajkumar somehow managed to break out of this thread.
> Quoting his message below so that I don't end up replying with patches on
> two different threads.
>
> On 2016/09/14 16:58, Rajkumar Raghuwanshi wrote:
>> I have Continued with testing declarative partitioning with the latest
>> patch. Got some more observation, given below
>
> Thanks a lot for testing.
>
>> -- Observation 1 : Getting overlap error with START with EXCLUSIVE in range
>> partition.
>>
>> create table test_range_bound ( a int) partition by range(a);
>> --creating a partition to contain records {1,2,3,4}, by default 1 is
>> inclusive and 5 is exclusive
>> create table test_range_bound_p1 partition of test_range_bound for values
>> start (1) end (5);
>> --now trying to create a partition by explicitly mentioning start is
>> exclusive to contain records {5,6,7}, here trying to create with START with
>> 4 as exclusive so range should be 5 to 8, but getting partition overlap
>> error.
>> create table test_range_bound_p2 partition of test_range_bound for values
>> start (4) EXCLUSIVE end (8);
>> ERROR: partition "test_range_bound_p2" would overlap partition
>> "test_range_bound_p1"
>
> Wow, this is bad. What is needed in this case is "canonicalization" of
> the range partition bounds specified in the command. Range types do this
> and hence an equivalent test done with range type values would disagree
> with the result given by the patch for range partition bounds.
>
> select '[1,5)'::int4range && '(4,8]'::int4range as cmp;
> cmp
> -----
> f
> (1 row)
>
> In this case, the second range is converted into its equivalent canonical
> form viz. '[5, 9)'. Then comparison of bounds 5) and [5 can tell that the
> ranges do not overlap after all. Range type operators can do this because
> their code can rely on the availability of a canonicalization function for
> a given range type. From the range types documentation:
>
> """
> A discrete range type should have a canonicalization function that is
> aware of the desired step size for the element type. The canonicalization
> function is charged with converting equivalent values of the range type to
> have identical representations, in particular consistently inclusive or
> exclusive bounds. If a canonicalization function is not specified, then
> ranges with different formatting will always be treated as unequal, even
> though they might represent the same set of values in reality.
> """
>
> to extend the last sentence:
>
> "... or consider two ranges overlapping when in reality they are not
> (maybe they are really just adjacent)."
>
> Within the code handling range partition bound, no such canonicalization
> happens, so comparison 5) and (4 ends up concluding that upper1 > lower2,
> hence ranges overlap.
>
> To mitigate this, how about we restrict range partition key to contain
> columns of only those types for which we know we can safely canonicalize a
> range bound (ie, discrete range types)? I don't think we can use, say,
> existing int4range_canonical but will have to write a version of it for
> partitioning usage (range bounds of partitions are different from what
> int4range_canonical is ready to handle). This approach will be very
> limiting as then range partitions will be limited to columns of int,
> bigint and date type only.
>
> One more option is we let the user specify the canonicalize function next
> to the column name when defining the partition key. If not specified, we
> hard-code one for the types for which we will be implementing a
> canonicalize function (ie, above mentioned types). In other cases, we
> just don't have one and hence if an unexpected result occurs when creating
> a new partition, it's up to the user to realize what happened. Of course,
> we will be mentioning in the documentation why a canonicalize function is
> necessary and how to write one. Note that this canonicalize function
> comes into play only when defining new partitions, it has no role beyond
> that point.
>
>> -- Observation 2 : able to create sub-partition out of the range set for
>> main table, causing not able to insert data satisfying any of the partition.
>>
>> create table test_subpart (c1 int) partition by range (c1);
>> create table test_subpart_p1 partition of test_subpart for values start (1)
>> end (100) inclusive partition by range (c1);
>> create table test_subpart_p1_sub1 partition of test_subpart_p1 for values
>> start (101) end (200);
>>
>> \d+ test_subpart
>> Table "public.test_subpart"
>> Column | Type | Modifiers | Storage | Stats target | Description
>> --------+---------+-----------+---------+--------------+-------------
>> c1 | integer | | plain | |
>> Partition Key: RANGE (c1)
>> Partitions: test_subpart_p1 FOR VALUES START (1) END (100) INCLUSIVE
>>
>> \d+ test_subpart_p1
>> Table "public.test_subpart_p1"
>> Column | Type | Modifiers | Storage | Stats target | Description
>> --------+---------+-----------+---------+--------------+-------------
>> c1 | integer | | plain | |
>> Partition Of: test_subpart FOR VALUES START (1) END (100) INCLUSIVE
>> Partition Key: RANGE (c1)
>> Partitions: test_subpart_p1_sub1 FOR VALUES START (101) END (200)
>>
>> insert into test_subpart values (50);
>> ERROR: no partition of relation "test_subpart_p1" found for row
>> DETAIL: Failing row contains (50).
>> insert into test_subpart values (150);
>> ERROR: no partition of relation "test_subpart" found for row
>> DETAIL: Failing row contains (150).
>
> It seems that DDL should prevent the same column being used in partition
> key of lower level partitions. I don't know how much sense it would make,
> but being able to use the same column as partition key of lower level
> partitions may be a feature useful to some users if they know what they
> are doing. But this last part doesn't sound like a good thing. I
> modified the patch such that lower level partitions cannot use columns
> used by ancestor tables.
>
>> -- Observation 3 : Getting cache lookup failed, when selecting list
>> partition table containing array.
>>
>> CREATE TABLE test_array ( i int,j int[],k text[]) PARTITION BY LIST (j);
>> CREATE TABLE test_array_p1 PARTITION OF test_array FOR VALUES IN ('{1}');
>> CREATE TABLE test_array_p2 PARTITION OF test_array FOR VALUES IN ('{2,2}');
>>
>> INSERT INTO test_array (i,j[1],k[1]) VALUES (1,1,1);
>> INSERT INTO test_array (i,j[1],j[2],k[1]) VALUES (2,2,2,2);
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array_p1;
>> tableoid | i | j | k
>> ---------------+---+-----+-----
>> test_array_p1 | 1 | {1} | {1}
>> (1 row)
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array_p2;
>> tableoid | i | j | k
>> ---------------+---+-------+-----
>> test_array_p2 | 2 | {2,2} | {2}
>> (1 row)
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array;
>> ERROR: cache lookup failed for type 0
>
> That's a bug. Fixed in the attached patch.
>
> PS: I'm going to have limited Internet access during this weekend and over
> the next week, so responses could be slow. Sorry about that.
>
> Thanks,
> Amit

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2016-09-15 12:59:51 Re: sequences and pg_upgrade
Previous Message Peter Eisentraut 2016-09-15 12:53:43 Re: autonomous transactions