Re: Declarative partitioning - another take

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Erik Rijkers <er(at)xs4all(dot)nl>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: Declarative partitioning - another take
Date: 2016-12-21 10:33:44
Message-ID: d466b1e2-19e7-bd99-20f2-1ca26e5eb228@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016/12/21 1:53, Robert Haas wrote:
> On Mon, Dec 19, 2016 at 10:59 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sun, Dec 18, 2016 at 10:00 PM, Amit Langote
>> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> Here are updated patches including the additional information.
>>
>> Thanks. Committed 0001. Will have to review the others when I'm less tired.
>
> 0002. Can you add a test case for the bug fixed by this patch?

Done (also see below).

> 0003. Loses equalTupleDescs() check and various cases where
> ExecOpenIndexes can be skipped. Isn't that bad?

I realized that as far as checking whether tuple conversion mapping is
required, the checks performed by convert_tuples_by_name() at the
beginning of the function following the comment /* Verify compatibility
and prepare attribute-number map */ are enough. If equalTupleDescs()
returned false (which it always does because tdtypeid are never the same
for passed in tuple descriptors), we would be repeating some of the tests
in convert_tuples_by_name() anyway.

As for the checks performed for ExecOpenIndices(), it seems would be
better to keep the following in place, so added back:

if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
leaf_part_rri->ri_IndexRelationDescs == NULL)
ExecOpenIndices(leaf_part_rri, false);

> Also, "We locked all
> the partitions above including the leaf partitions" should say "Caller
> must have locked all the partitions including the leaf partitions".

No, we do the locking in RelationGetPartitionDispatchInfo(), which is
called by ExecSetupPartitionTupleRouting() itself.

In ExecSetupPartitionTupleRouting() is the first time we lock all the
partitions.

> 0004. Unnecessary whitespace change in executor.h. Still don't
> understand why we need to hoist RelationGetPartitionQual() into the
> caller.

We only need to check a result relation's (ri_RelationDesc's) partition
constraint if we are inserting into the result relation directly. In case
of tuple-routing, we do not want to check the leaf partitions' partition
constraint, but if the direct target in that case is an internal
partition, we must check its partition constraint, which is same for all
leaf partitions in that sub-tree. It wouldn't be wrong per se to check
each leaf partition's constraint in that case, which includes the target
partitioned table's constraint as well, but that would inefficient due to
both having to retrieve the constraints and having ExecConstraints()
*unnecessarily* check it for every row.

If we keep doing RelationGetPartitionQual() in InitResultRelInfo()
controlled by a bool argument (load_partition_check), we cannot avoid the
above mentioned inefficiency if we want to fix this bug.

> 0005. Can you add a test case for the bug fixed by this patch?

Done, but...

Breaking changes into multiple commits/patches does not seem to work for
adding regression tests. So, I've combined multiple patches into a single
patch which is now patch 0002 in the attached set of patches. Its commit
message is very long now. To show an example of bugs that 0002 is meant for:

create table p (a int, b int) partition by range (a, b);
create table p1 (b int, a int not null) partition by range (b);
create table p11 (like p1);
alter table p11 drop a;
alter table p11 add a int;
alter table p11 drop a;
alter table p11 add a int not null;

# select attrelid::regclass, attname, attnum
from pg_attribute
where attnum > 0
and (attrelid = 'p'::regclass
or attrelid = 'p1'::regclass
or attrelid = 'p11'::regclass) and attname = 'a';
attrelid | attname | attnum
----------+---------+--------
p | a | 1
p1 | a | 2
p11 | a | 4
(3 rows)

alter table p1 attach partition p11 for values from (1) to (5);
alter table p attach partition p1 for values from (1, 1) to (1, 10);

-- the following is wrong
# insert into p11 (a, b) values (10, 4);
INSERT 0 1

-- wrong too (using the wrong TupleDesc after tuple routing)
# insert into p1 (a, b) values (10, 4);
ERROR: null value in column "a" violates not-null constraint
DETAIL: Failing row contains (4, null).

-- once we fix the wrong TupleDesc issue
# insert into p1 (a, b) values (10, 4);
INSERT 0 1

which is wrong because p1, as a partition of p, should not accept 10 for
a. But its partition constraint is not being applied to the leaf
partition p11 into which the tuple is routed (the bug).

Thanks,
Amit

Attachment Content-Type Size
0001-Refactor-tuple-routing-setup-code.patch text/x-diff 10.9 KB
0002-Fix-multiple-issues-with-partition-constraints.patch text/x-diff 34.1 KB
0003-Add-some-more-tests-for-tuple-routing.patch text/x-diff 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2016-12-21 11:01:57 Re: SET NOT NULL [NOT VALID / CONCURRENTLY]?
Previous Message Craig Ringer 2016-12-21 10:31:52 Re: pg_background contrib module proposal