Quick Links

Re: Declarative partitioning - another take

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
Cc:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Declarative partitioning - another take
Date:	2016-09-15 08:53:31
Message-ID:	e5c1c9cf-3f5a-c4d7-6047-7351147aaef9@lab.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2016/09/09 17:55, Amit Langote wrote:
> On 2016/09/06 22:04, Amit Langote wrote:
>> Will fix.
>
> Here is an updated set of patches.

An email from Rajkumar somehow managed to break out of this thread.
Quoting his message below so that I don't end up replying with patches on
two different threads.

On 2016/09/14 16:58, Rajkumar Raghuwanshi wrote:
> I have Continued with testing declarative partitioning with the latest
> patch. Got some more observation, given below

Thanks a lot for testing.

> -- Observation 1 : Getting overlap error with START with EXCLUSIVE in range
> partition.
>
> create table test_range_bound ( a int) partition by range(a);
> --creating a partition to contain records {1,2,3,4}, by default 1 is
> inclusive and 5 is exclusive
> create table test_range_bound_p1 partition of test_range_bound for values
> start (1) end (5);
> --now trying to create a partition by explicitly mentioning start is
> exclusive to contain records {5,6,7}, here trying to create with START with
> 4 as exclusive so range should be 5 to 8, but getting partition overlap
> error.
> create table test_range_bound_p2 partition of test_range_bound for values
> start (4) EXCLUSIVE end (8);
> ERROR: partition "test_range_bound_p2" would overlap partition
> "test_range_bound_p1"

Wow, this is bad. What is needed in this case is "canonicalization" of
the range partition bounds specified in the command. Range types do this
and hence an equivalent test done with range type values would disagree
with the result given by the patch for range partition bounds.

select '[1,5)'::int4range && '(4,8]'::int4range as cmp;
cmp
-----
f
(1 row)

In this case, the second range is converted into its equivalent canonical
form viz. '[5, 9)'. Then comparison of bounds 5) and [5 can tell that the
ranges do not overlap after all. Range type operators can do this because
their code can rely on the availability of a canonicalization function for
a given range type. From the range types documentation:

"""
A discrete range type should have a canonicalization function that is
aware of the desired step size for the element type. The canonicalization
function is charged with converting equivalent values of the range type to
have identical representations, in particular consistently inclusive or
exclusive bounds. If a canonicalization function is not specified, then
ranges with different formatting will always be treated as unequal, even
though they might represent the same set of values in reality.
"""

to extend the last sentence:

"... or consider two ranges overlapping when in reality they are not
(maybe they are really just adjacent)."

Within the code handling range partition bound, no such canonicalization
happens, so comparison 5) and (4 ends up concluding that upper1 > lower2,
hence ranges overlap.

To mitigate this, how about we restrict range partition key to contain
columns of only those types for which we know we can safely canonicalize a
range bound (ie, discrete range types)? I don't think we can use, say,
existing int4range_canonical but will have to write a version of it for
partitioning usage (range bounds of partitions are different from what
int4range_canonical is ready to handle). This approach will be very
limiting as then range partitions will be limited to columns of int,
bigint and date type only.

One more option is we let the user specify the canonicalize function next
to the column name when defining the partition key. If not specified, we
hard-code one for the types for which we will be implementing a
canonicalize function (ie, above mentioned types). In other cases, we
just don't have one and hence if an unexpected result occurs when creating
a new partition, it's up to the user to realize what happened. Of course,
we will be mentioning in the documentation why a canonicalize function is
necessary and how to write one. Note that this canonicalize function
comes into play only when defining new partitions, it has no role beyond
that point.

> -- Observation 2 : able to create sub-partition out of the range set for
> main table, causing not able to insert data satisfying any of the partition.
>
> create table test_subpart (c1 int) partition by range (c1);
> create table test_subpart_p1 partition of test_subpart for values start (1)
> end (100) inclusive partition by range (c1);
> create table test_subpart_p1_sub1 partition of test_subpart_p1 for values
> start (101) end (200);
>
> \d+ test_subpart
> Table "public.test_subpart"
> Column | Type | Modifiers | Storage | Stats target | Description
> --------+---------+-----------+---------+--------------+-------------
> c1 | integer | | plain | |
> Partition Key: RANGE (c1)
> Partitions: test_subpart_p1 FOR VALUES START (1) END (100) INCLUSIVE
>
> \d+ test_subpart_p1
> Table "public.test_subpart_p1"
> Column | Type | Modifiers | Storage | Stats target | Description
> --------+---------+-----------+---------+--------------+-------------
> c1 | integer | | plain | |
> Partition Of: test_subpart FOR VALUES START (1) END (100) INCLUSIVE
> Partition Key: RANGE (c1)
> Partitions: test_subpart_p1_sub1 FOR VALUES START (101) END (200)
>
> insert into test_subpart values (50);
> ERROR: no partition of relation "test_subpart_p1" found for row
> DETAIL: Failing row contains (50).
> insert into test_subpart values (150);
> ERROR: no partition of relation "test_subpart" found for row
> DETAIL: Failing row contains (150).

It seems that DDL should prevent the same column being used in partition
key of lower level partitions. I don't know how much sense it would make,
but being able to use the same column as partition key of lower level
partitions may be a feature useful to some users if they know what they
are doing. But this last part doesn't sound like a good thing. I
modified the patch such that lower level partitions cannot use columns
used by ancestor tables.

> -- Observation 3 : Getting cache lookup failed, when selecting list
> partition table containing array.
>
> CREATE TABLE test_array ( i int,j int[],k text[]) PARTITION BY LIST (j);
> CREATE TABLE test_array_p1 PARTITION OF test_array FOR VALUES IN ('{1}');
> CREATE TABLE test_array_p2 PARTITION OF test_array FOR VALUES IN ('{2,2}');
>
> INSERT INTO test_array (i,j[1],k[1]) VALUES (1,1,1);
> INSERT INTO test_array (i,j[1],j[2],k[1]) VALUES (2,2,2,2);
>
> postgres=# SELECT tableoid::regclass,* FROM test_array_p1;
> tableoid | i | j | k
> ---------------+---+-----+-----
> test_array_p1 | 1 | {1} | {1}
> (1 row)
>
> postgres=# SELECT tableoid::regclass,* FROM test_array_p2;
> tableoid | i | j | k
> ---------------+---+-------+-----
> test_array_p2 | 2 | {2,2} | {2}
> (1 row)
>
> postgres=# SELECT tableoid::regclass,* FROM test_array;
> ERROR: cache lookup failed for type 0

That's a bug. Fixed in the attached patch.

PS: I'm going to have limited Internet access during this weekend and over
the next week, so responses could be slow. Sorry about that.

Thanks,
Amit

Attachment	Content-Type	Size
0001-Catalog-and-DDL-for-partitioned-tables-5.patch	text/x-diff	116.7 KB
0002-psql-and-pg_dump-support-for-partitioned-tables-5.patch	text/x-diff	22.5 KB
0003-Catalog-and-DDL-for-partitions-5.patch	text/x-diff	207.5 KB
0004-psql-and-pg_dump-support-for-partitions-5.patch	text/x-diff	20.6 KB
0005-Refactor-optimizer-s-inheritance-set-expansion-code-5.patch	text/x-diff	14.5 KB
0006-Teach-a-few-places-to-use-partition-check-quals-5.patch	text/x-diff	30.2 KB
0007-Introduce-a-PartitionTreeNode-data-structure-5.patch	text/x-diff	8.0 KB
0008-Tuple-routing-for-partitioned-tables-5.patch	text/x-diff	41.5 KB
0009-Update-DDL-Partitioning-chapter-to-reflect-new-devel-5.patch	text/x-diff	24.7 KB

In response to

Re: Declarative partitioning - another take at 2016-09-09 08:55:48 from Amit Langote

Responses

Re: Declarative partitioning - another take at 2016-09-15 12:58:05 from Ashutosh Bapat
Re: Declarative partitioning - another take at 2016-09-15 16:37:17 from Robert Haas
Re: Declarative partitioning - another take at 2016-09-20 13:41:06 from Robert Haas
Re: Declarative partitioning - another take at 2016-09-22 05:42:31 from Ashutosh Bapat

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kuntal Ghosh	2016-09-15 10:30:34	Re: WAL consistency check facility
Previous Message	Mark Kirkwood	2016-09-15 08:41:19	Re: less expensive pg_buffercache on big shmem