Re: Declarative partitioning - another take

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning - another take
Date: 2016-09-22 05:42:31
Message-ID: CAFjFpRc=T+CjpGNkNSdOkHza8VAPb35bngaCdAzPgBkhijmJhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit,
Following sequence of DDLs gets an error
--
-- multi-leveled partitions
--
CREATE TABLE prt1_l (a int, b int, c varchar) PARTITION BY RANGE(a);
CREATE TABLE prt1_l_p1 PARTITION OF prt1_l FOR VALUES START (0) END
(250) PARTITION BY RANGE (b);
CREATE TABLE prt1_l_p1_p1 PARTITION OF prt1_l_p1 FOR VALUES START (0) END (100);
CREATE TABLE prt1_l_p1_p2 PARTITION OF prt1_l_p1 FOR VALUES START
(100) END (250);
CREATE TABLE prt1_l_p2 PARTITION OF prt1_l FOR VALUES START (250) END
(500) PARTITION BY RANGE (c);
CREATE TABLE prt1_l_p2_p1 PARTITION OF prt1_l_p2 FOR VALUES START
('0250') END ('0400');
CREATE TABLE prt1_l_p2_p2 PARTITION OF prt1_l_p2 FOR VALUES START
('0400') END ('0500');
CREATE TABLE prt1_l_p3 PARTITION OF prt1_l FOR VALUES START (500) END
(600) PARTITION BY RANGE ((b + a));
ERROR: cannot use column or expression from ancestor partition key

The last statement is trying create subpartitions by range (b + a),
which contains a partition key from ancestor partition key but is not
exactly same as that. In fact it contains some extra columns other
than the ancestor partition key columns. Why do we want to prohibit
such cases?

On Thu, Sep 15, 2016 at 2:23 PM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Hi
>
> On 2016/09/09 17:55, Amit Langote wrote:
>> On 2016/09/06 22:04, Amit Langote wrote:
>>> Will fix.
>>
>> Here is an updated set of patches.
>
> An email from Rajkumar somehow managed to break out of this thread.
> Quoting his message below so that I don't end up replying with patches on
> two different threads.
>
> On 2016/09/14 16:58, Rajkumar Raghuwanshi wrote:
>> I have Continued with testing declarative partitioning with the latest
>> patch. Got some more observation, given below
>
> Thanks a lot for testing.
>
>> -- Observation 1 : Getting overlap error with START with EXCLUSIVE in range
>> partition.
>>
>> create table test_range_bound ( a int) partition by range(a);
>> --creating a partition to contain records {1,2,3,4}, by default 1 is
>> inclusive and 5 is exclusive
>> create table test_range_bound_p1 partition of test_range_bound for values
>> start (1) end (5);
>> --now trying to create a partition by explicitly mentioning start is
>> exclusive to contain records {5,6,7}, here trying to create with START with
>> 4 as exclusive so range should be 5 to 8, but getting partition overlap
>> error.
>> create table test_range_bound_p2 partition of test_range_bound for values
>> start (4) EXCLUSIVE end (8);
>> ERROR: partition "test_range_bound_p2" would overlap partition
>> "test_range_bound_p1"
>
> Wow, this is bad. What is needed in this case is "canonicalization" of
> the range partition bounds specified in the command. Range types do this
> and hence an equivalent test done with range type values would disagree
> with the result given by the patch for range partition bounds.
>
> select '[1,5)'::int4range && '(4,8]'::int4range as cmp;
> cmp
> -----
> f
> (1 row)
>
> In this case, the second range is converted into its equivalent canonical
> form viz. '[5, 9)'. Then comparison of bounds 5) and [5 can tell that the
> ranges do not overlap after all. Range type operators can do this because
> their code can rely on the availability of a canonicalization function for
> a given range type. From the range types documentation:
>
> """
> A discrete range type should have a canonicalization function that is
> aware of the desired step size for the element type. The canonicalization
> function is charged with converting equivalent values of the range type to
> have identical representations, in particular consistently inclusive or
> exclusive bounds. If a canonicalization function is not specified, then
> ranges with different formatting will always be treated as unequal, even
> though they might represent the same set of values in reality.
> """
>
> to extend the last sentence:
>
> "... or consider two ranges overlapping when in reality they are not
> (maybe they are really just adjacent)."
>
> Within the code handling range partition bound, no such canonicalization
> happens, so comparison 5) and (4 ends up concluding that upper1 > lower2,
> hence ranges overlap.
>
> To mitigate this, how about we restrict range partition key to contain
> columns of only those types for which we know we can safely canonicalize a
> range bound (ie, discrete range types)? I don't think we can use, say,
> existing int4range_canonical but will have to write a version of it for
> partitioning usage (range bounds of partitions are different from what
> int4range_canonical is ready to handle). This approach will be very
> limiting as then range partitions will be limited to columns of int,
> bigint and date type only.
>
> One more option is we let the user specify the canonicalize function next
> to the column name when defining the partition key. If not specified, we
> hard-code one for the types for which we will be implementing a
> canonicalize function (ie, above mentioned types). In other cases, we
> just don't have one and hence if an unexpected result occurs when creating
> a new partition, it's up to the user to realize what happened. Of course,
> we will be mentioning in the documentation why a canonicalize function is
> necessary and how to write one. Note that this canonicalize function
> comes into play only when defining new partitions, it has no role beyond
> that point.
>
>> -- Observation 2 : able to create sub-partition out of the range set for
>> main table, causing not able to insert data satisfying any of the partition.
>>
>> create table test_subpart (c1 int) partition by range (c1);
>> create table test_subpart_p1 partition of test_subpart for values start (1)
>> end (100) inclusive partition by range (c1);
>> create table test_subpart_p1_sub1 partition of test_subpart_p1 for values
>> start (101) end (200);
>>
>> \d+ test_subpart
>> Table "public.test_subpart"
>> Column | Type | Modifiers | Storage | Stats target | Description
>> --------+---------+-----------+---------+--------------+-------------
>> c1 | integer | | plain | |
>> Partition Key: RANGE (c1)
>> Partitions: test_subpart_p1 FOR VALUES START (1) END (100) INCLUSIVE
>>
>> \d+ test_subpart_p1
>> Table "public.test_subpart_p1"
>> Column | Type | Modifiers | Storage | Stats target | Description
>> --------+---------+-----------+---------+--------------+-------------
>> c1 | integer | | plain | |
>> Partition Of: test_subpart FOR VALUES START (1) END (100) INCLUSIVE
>> Partition Key: RANGE (c1)
>> Partitions: test_subpart_p1_sub1 FOR VALUES START (101) END (200)
>>
>> insert into test_subpart values (50);
>> ERROR: no partition of relation "test_subpart_p1" found for row
>> DETAIL: Failing row contains (50).
>> insert into test_subpart values (150);
>> ERROR: no partition of relation "test_subpart" found for row
>> DETAIL: Failing row contains (150).
>
> It seems that DDL should prevent the same column being used in partition
> key of lower level partitions. I don't know how much sense it would make,
> but being able to use the same column as partition key of lower level
> partitions may be a feature useful to some users if they know what they
> are doing. But this last part doesn't sound like a good thing. I
> modified the patch such that lower level partitions cannot use columns
> used by ancestor tables.
>
>> -- Observation 3 : Getting cache lookup failed, when selecting list
>> partition table containing array.
>>
>> CREATE TABLE test_array ( i int,j int[],k text[]) PARTITION BY LIST (j);
>> CREATE TABLE test_array_p1 PARTITION OF test_array FOR VALUES IN ('{1}');
>> CREATE TABLE test_array_p2 PARTITION OF test_array FOR VALUES IN ('{2,2}');
>>
>> INSERT INTO test_array (i,j[1],k[1]) VALUES (1,1,1);
>> INSERT INTO test_array (i,j[1],j[2],k[1]) VALUES (2,2,2,2);
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array_p1;
>> tableoid | i | j | k
>> ---------------+---+-----+-----
>> test_array_p1 | 1 | {1} | {1}
>> (1 row)
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array_p2;
>> tableoid | i | j | k
>> ---------------+---+-------+-----
>> test_array_p2 | 2 | {2,2} | {2}
>> (1 row)
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array;
>> ERROR: cache lookup failed for type 0
>
> That's a bug. Fixed in the attached patch.
>
> PS: I'm going to have limited Internet access during this weekend and over
> the next week, so responses could be slow. Sorry about that.
>
> Thanks,
> Amit

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-09-22 05:52:52 Re: Tracking wait event for latches
Previous Message Rushabh Lathia 2016-09-22 05:36:35 Re: Showing parallel status in \df+