Re: Declarative partitioning - another take

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning - another take
Date: 2016-09-15 08:53:31
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 2016/09/09 17:55, Amit Langote wrote:
> On 2016/09/06 22:04, Amit Langote wrote:
>> Will fix.
> Here is an updated set of patches.

An email from Rajkumar somehow managed to break out of this thread.
Quoting his message below so that I don't end up replying with patches on
two different threads.

On 2016/09/14 16:58, Rajkumar Raghuwanshi wrote:
> I have Continued with testing declarative partitioning with the latest
> patch. Got some more observation, given below

Thanks a lot for testing.

> -- Observation 1 : Getting overlap error with START with EXCLUSIVE in range
> partition.
> create table test_range_bound ( a int) partition by range(a);
> --creating a partition to contain records {1,2,3,4}, by default 1 is
> inclusive and 5 is exclusive
> create table test_range_bound_p1 partition of test_range_bound for values
> start (1) end (5);
> --now trying to create a partition by explicitly mentioning start is
> exclusive to contain records {5,6,7}, here trying to create with START with
> 4 as exclusive so range should be 5 to 8, but getting partition overlap
> error.
> create table test_range_bound_p2 partition of test_range_bound for values
> start (4) EXCLUSIVE end (8);
> ERROR: partition "test_range_bound_p2" would overlap partition
> "test_range_bound_p1"

Wow, this is bad. What is needed in this case is "canonicalization" of
the range partition bounds specified in the command. Range types do this
and hence an equivalent test done with range type values would disagree
with the result given by the patch for range partition bounds.

select '[1,5)'::int4range && '(4,8]'::int4range as cmp;
(1 row)

In this case, the second range is converted into its equivalent canonical
form viz. '[5, 9)'. Then comparison of bounds 5) and [5 can tell that the
ranges do not overlap after all. Range type operators can do this because
their code can rely on the availability of a canonicalization function for
a given range type. From the range types documentation:

A discrete range type should have a canonicalization function that is
aware of the desired step size for the element type. The canonicalization
function is charged with converting equivalent values of the range type to
have identical representations, in particular consistently inclusive or
exclusive bounds. If a canonicalization function is not specified, then
ranges with different formatting will always be treated as unequal, even
though they might represent the same set of values in reality.

to extend the last sentence:

"... or consider two ranges overlapping when in reality they are not
(maybe they are really just adjacent)."

Within the code handling range partition bound, no such canonicalization
happens, so comparison 5) and (4 ends up concluding that upper1 > lower2,
hence ranges overlap.

To mitigate this, how about we restrict range partition key to contain
columns of only those types for which we know we can safely canonicalize a
range bound (ie, discrete range types)? I don't think we can use, say,
existing int4range_canonical but will have to write a version of it for
partitioning usage (range bounds of partitions are different from what
int4range_canonical is ready to handle). This approach will be very
limiting as then range partitions will be limited to columns of int,
bigint and date type only.

One more option is we let the user specify the canonicalize function next
to the column name when defining the partition key. If not specified, we
hard-code one for the types for which we will be implementing a
canonicalize function (ie, above mentioned types). In other cases, we
just don't have one and hence if an unexpected result occurs when creating
a new partition, it's up to the user to realize what happened. Of course,
we will be mentioning in the documentation why a canonicalize function is
necessary and how to write one. Note that this canonicalize function
comes into play only when defining new partitions, it has no role beyond
that point.

> -- Observation 2 : able to create sub-partition out of the range set for
> main table, causing not able to insert data satisfying any of the partition.
> create table test_subpart (c1 int) partition by range (c1);
> create table test_subpart_p1 partition of test_subpart for values start (1)
> end (100) inclusive partition by range (c1);
> create table test_subpart_p1_sub1 partition of test_subpart_p1 for values
> start (101) end (200);
> \d+ test_subpart
> Table "public.test_subpart"
> Column | Type | Modifiers | Storage | Stats target | Description
> --------+---------+-----------+---------+--------------+-------------
> c1 | integer | | plain | |
> Partition Key: RANGE (c1)
> Partitions: test_subpart_p1 FOR VALUES START (1) END (100) INCLUSIVE
> \d+ test_subpart_p1
> Table "public.test_subpart_p1"
> Column | Type | Modifiers | Storage | Stats target | Description
> --------+---------+-----------+---------+--------------+-------------
> c1 | integer | | plain | |
> Partition Of: test_subpart FOR VALUES START (1) END (100) INCLUSIVE
> Partition Key: RANGE (c1)
> Partitions: test_subpart_p1_sub1 FOR VALUES START (101) END (200)
> insert into test_subpart values (50);
> ERROR: no partition of relation "test_subpart_p1" found for row
> DETAIL: Failing row contains (50).
> insert into test_subpart values (150);
> ERROR: no partition of relation "test_subpart" found for row
> DETAIL: Failing row contains (150).

It seems that DDL should prevent the same column being used in partition
key of lower level partitions. I don't know how much sense it would make,
but being able to use the same column as partition key of lower level
partitions may be a feature useful to some users if they know what they
are doing. But this last part doesn't sound like a good thing. I
modified the patch such that lower level partitions cannot use columns
used by ancestor tables.

> -- Observation 3 : Getting cache lookup failed, when selecting list
> partition table containing array.
> CREATE TABLE test_array ( i int,j int[],k text[]) PARTITION BY LIST (j);
> CREATE TABLE test_array_p1 PARTITION OF test_array FOR VALUES IN ('{1}');
> CREATE TABLE test_array_p2 PARTITION OF test_array FOR VALUES IN ('{2,2}');
> INSERT INTO test_array (i,j[1],k[1]) VALUES (1,1,1);
> INSERT INTO test_array (i,j[1],j[2],k[1]) VALUES (2,2,2,2);
> postgres=# SELECT tableoid::regclass,* FROM test_array_p1;
> tableoid | i | j | k
> ---------------+---+-----+-----
> test_array_p1 | 1 | {1} | {1}
> (1 row)
> postgres=# SELECT tableoid::regclass,* FROM test_array_p2;
> tableoid | i | j | k
> ---------------+---+-------+-----
> test_array_p2 | 2 | {2,2} | {2}
> (1 row)
> postgres=# SELECT tableoid::regclass,* FROM test_array;
> ERROR: cache lookup failed for type 0

That's a bug. Fixed in the attached patch.

PS: I'm going to have limited Internet access during this weekend and over
the next week, so responses could be slow. Sorry about that.


Attachment Content-Type Size
0001-Catalog-and-DDL-for-partitioned-tables-5.patch text/x-diff 116.7 KB
0002-psql-and-pg_dump-support-for-partitioned-tables-5.patch text/x-diff 22.5 KB
0003-Catalog-and-DDL-for-partitions-5.patch text/x-diff 207.5 KB
0004-psql-and-pg_dump-support-for-partitions-5.patch text/x-diff 20.6 KB
0005-Refactor-optimizer-s-inheritance-set-expansion-code-5.patch text/x-diff 14.5 KB
0006-Teach-a-few-places-to-use-partition-check-quals-5.patch text/x-diff 30.2 KB
0007-Introduce-a-PartitionTreeNode-data-structure-5.patch text/x-diff 8.0 KB
0008-Tuple-routing-for-partitioned-tables-5.patch text/x-diff 41.5 KB
0009-Update-DDL-Partitioning-chapter-to-reflect-new-devel-5.patch text/x-diff 24.7 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2016-09-15 10:30:34 Re: WAL consistency check facility
Previous Message Mark Kirkwood 2016-09-15 08:41:19 Re: less expensive pg_buffercache on big shmem