Re: Declarative partitioning - another take

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning - another take
Date: 2016-11-24 11:13:22
Message-ID: f2f36bbd-0851-2256-5ca1-86322c1f82e3@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016/11/23 4:50, Robert Haas wrote:
> On Tue, Nov 22, 2016 at 4:15 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> The easiest thing to do might be to just enforce that all of the
>>> partition key columns have to be not-null when the range-partitioned
>>> table is defined, and reject any attempt to DROP NOT NULL on them
>>> later. That's probably better that shoehorning it into the table
>>> constraint.
>>
>> Agreed that forcing range partitioning columns to be NOT NULL during table
>> creation would be a better approach. But then we would have to reject
>> using expressions in the range partition key, right?
>
> Why?

I was thinking of it like how primary key columns cannot contain
expressions; the reason for which, I assume, is because while we can
ensure that a column contains only non-null values by defining a
constraint on the column, there is no way to force expressions to be non-null.

In any case, I have implemented in the latest patch that when creating a
range partitioned table, its partition key columns are automatically set
to be NOT NULL. Although, it leaves out the columns that are referenced
in expressions. So even after doing so, we need to check after computing
the range partition key of a input row, that none of the partitions keys
is null, because expressions can still return null.

Also, it does nothing to help the undesirable situation that one can
insert a row with a null partition key (expression) into any of the range
partitions if targeted directly, because of how ExecQual() handles
nullable constraint expressions (treats null value as satisfying the
partition constraint).

An alternative possibly worth considering might be to somehow handle the
null range partition keys within the logic to compare against range bound
datums. It looks like other databases will map the rows containing nulls
to the unbounded partition. One database allows specifying NULLS
FIRST/LAST and maps a row containing null key to the partition with
-infinity as the lower bound or +infinity as the upper bound, respectively
with NULLS LAST the default behavior.

In our case, if we allowed a similar way of defining a range partitioned
table:

create table p (a int, b int) partition by range nulls first (a);
create table p0 partition of p for values from (unbounded) to (1);
create table p1 partition of p for values from (1) to (10);
create table p2 partition of p for values from (10) to (unbounded);

Row (null, 1) will be mapped to p0. If we didn't have p0, we would report
the "partition not found" error.

In case of a multi-column key:

create table p (a int, b int) partition by range (a, b);
create table p0 partition of p for values from (1, unbounded) to (1, 1);
create table p1 partition of p for values from (1, 1) to (1, 10);
create table p2 partition of p for values from (1, 10) to (1, unbounded);

Row (1, null) will be mapped to p2 (default nulls last behavior).

But I guess we still end up without a solution for the problem that a row
with null partition key (expression) could be inserted into any of the
range partitions if targeted directly.

Thoughts?

>> As a result, one change became necessary: to how we flag individual range
>> bound datum as infinite or not. Previously, it was a regular Boolean
>> value (either infinite or not) and to distinguish +infinity from
>> -infinity, we looked at whether the bound is lower or upper (the lower
>> flag). Now, instead, the variable holding the status of individual range
>> bound datum is set to a ternary value: RANGE_DATUM_FINITE (0),
>> RANGE_DATUM_NEG_INF (1), and RANGE_DATUM_POS_INF (2), which still fits in
>> a bool.
>
> You better not be using a bool to represent a ternary value! Use an
> enum for that -- or if in the system catalogs, a char.

OK, created an enum called RangeDatumContent. In the system catalog, we
still store the boolean value; it is only after we read it into the
relcache structure that we use one of these enum values. I'm worried
though that using enum would consume more memory (we need to store nparts
* partnattrs instances of the enum).

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rushabh Lathia 2016-11-24 12:12:22 Re: Gather Merge
Previous Message Geoff Winkless 2016-11-24 10:45:44 Re: DISTINCT with btree skip scan