Re: Declarative partitioning - another take

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Erik Rijkers <er(at)xs4all(dot)nl>, Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: Declarative partitioning - another take
Date: 2016-12-16 08:02:48
Message-ID: c820c0eb-6935-6f84-8c6a-785fdff130c1@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016/12/14 1:32, Robert Haas wrote:
> On Tue, Dec 13, 2016 at 1:58 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Attaching the above patch, along with some other patches posted earlier,
>> and one more patch fixing another bug I found. Patch descriptions follow:
>>
>> 0001-Miscallaneous-code-and-comment-improvements.patch
>>
>> Fixes some obsolete comments while improving others. Also, implements
>> some of Tomas Vondra's review comments.
>
> Committed with some pgindenting.

Thanks!

>> 0002-Miscallaneous-documentation-improvements.patch
>>
>> Fixes inconsistencies and improves some examples in the documentation.
>> Also, mentions the limitation regarding row movement.
>
> Ignored because I committed what I think is the same or a similar
> patch earlier. Please resubmit any remaining changes.

It seems this patch is almost the same thing as what you committed.

>> 0003-Invalidate-the-parent-s-relcache-after-partition-cre.patch
>>
>> Fixes a bug reported by Tomas, whereby a parent's relcache was not
>> invalidated after creation of a new partition using CREATE TABLE PARTITION
>> OF. This resulted in tuple-routing code not being to able to find a
>> partition that was created by the last command in a given transaction.
>
> Shouldn't StorePartitionBound() be responsible for issuing its own
> invalidations, as StorePartitionKey() already is? Maybe you'd need to
> pass "parent" as another argument, but that way you know you don't
> have the same bug at some other place where the function is called.

OK, done that way in PATCH 1/7 (of the attached various patches as
described below).

>> 0004-Fix-a-bug-of-insertion-into-an-internal-partition.patch
>>
>> Fixes a bug I found this morning, whereby an internal partition's
>> constraint would not be enforced if it is targeted directly. See example
>> below:
>>
>> create table p (a int, b char) partition by range (a);
>> create table p1 partition of p for values from (1) to (10) partition by
>> list (b);
>> create table p1a partition of p1 for values in ('a');
>> insert into p1 values (0, 'a'); -- should fail, but doesn't
>
> I expect I'm missing something here, but why do we need to hoist
> RelationGetPartitionQual() out of InitResultRelInfo() instead of just
> having BeginCopy() pass true instead of false?
>
> (Also needs a rebase due to the pgindent cleanup.)

In this case, we want to enforce only the main target relation's partition
constraint (only needed if it happens to be an internal node partition),
not leaf partitions', because the latter is unnecessary.

We do InitResultRelInfo() for every leaf partition. What
RelationGetPartitionQual() would return to InitResultRelInfo() is the
partition constraint of the individual leaf partitions, which as just
mentioned is unnecessary, and also would be inefficient. With the
proposed patch, we only retrieve the partition constraint for the targeted
table (if there is any) once. However, when assigning to
ri_PartitionCheck of individual leaf partition's ResultRelInfo, we still
must map any Vars in the expression from the target tables's attnos to the
corresponding leaf partition's attnos (previous version of the patch
failed to do that).

>> 0005-Fix-a-tuple-routing-bug-in-multi-level-partitioned-t.patch
>>
>> Fixes a bug discovered by Dmitry Ivanov, whereby wrong indexes were
>> assigned to the partitions of lower levels (level > 1), causing spurious
>> "partition not found" error as demonstrated in his email [1].
>
> Committed. It might have been good to include a test case.

Agreed, added tests in the attached patch PATCH 7/7.

Aside from the above, I found few other issues and fixed them in the
attached patches. Descriptions follow:

[PATCH 1/7] Invalidate the parent's relcache after partition creation.

Invalidate parent's relcache after a partition is created using CREATE
TABLE PARTITION OF. (Independent reports by Keith Fiske and David Fetter)

[PATCH 2/7] Change how RelationGetPartitionQual() and related code works

Since we always want to recurse, ie, include the parent's partition
constraint (if any), get rid of the argument recurse.

Refactor out the code doing the mapping of attnos of Vars in partition
constraint expressions (parent attnos -> child attnos). Move it to a
separate function map_partition_varattnos() and call it from appropriate
places. It previously used to be done in get_qual_from_partbound(),
which would lead to wrong results in certain multi-level partitioning
cases, as the mapping would be done for immediate parent-partition pairs.
Now in generate_partition_qual() which is the workhorse of
RelationGetPartitionQual(), we first generate the full expression
(considering all levels of partitioning) and then do the mapping from the
root parent to a leaf partition. It is also possible to generate
partition constraint up to certain non-leaf level and then apply the
same to leaf partitions of that sub-tree after suitable substitution
of varattnos using the new map_partition_varattnos() directly.

Bug fix: ATExecAttachPartition() failed to do the mapping when attaching
a partitioned table as partition. It is possible for the partitions of
such table to have different attributes from the table being attached
and/or the target partitioned table.

[PATCH 3/7] Refactor tuple-routing setup code

It's unnecessarily repeated in copy.c and nodeModifyTable.c, which makes
it a burden to maintain. Should've been like this to begin with.

I moved the common code to ExecSetupPartitionTupleRouting() in execMain.c
that also houses ExecFindParttion() currently. Hmm, should there be a
new src/backend/executor/execPartition.c?

[PATCH 4/7] Fix a bug of insertion into an internal partition.

Since implicit partition constraints are not inherited, an internal
partition's constraint was not being enforced when targeted directly.
So, include such constraint when setting up leaf partition result
relations for tuple-routing.

InitResultRelInfo()'s API changes with this. Instead of passing
a boolean telling whether or not to load the partition constraint,
callers now need to pass the exact constraint expression to use
as ri_PartitionCheck or NIL.

[PATCH 5/7] Fix oddities of tuple-routing and TupleTableSlots

We must use the partition's tuple descriptor *after* a tuple is routed,
not the root table's. Partition's attributes, for example, may be
ordered diferently from the root table's.

We must then switch back to the root table's for the next tuple and
so on. A dedicated TupleTableSlot is allocated within EState called
es_partition_tuple_slot whose descriptor is set to a given leaf
partition for every row after it's routed.

[PATCH 6/7] Make ExecConstraints() emit the correct row in error msgs.

After a tuple is routed to a partition, it has been converted from the
root table's rowtype to the partition's. If such a tuple causes an
error in ExecConstraints(), the row shown in error messages might not
match the input row due to possible differences between the root
table's rowtype and the partition's.

To convert back to the correct row format, keep root table relation
descriptor and a reverse tuple conversion map in the ResultRelInfo's
of leaf partitions.

[PATCH 7/7] Add some tests for recent fixes to PartitionDispatch code in
a25665088d

Thanks,
Amit

Attachment Content-Type Size
0001-Invalidate-the-parent-s-relcache-after-partition-cre.patch text/x-diff 3.1 KB
0002-Change-how-RelationGetPartitionQual-and-related-code.patch text/x-diff 11.1 KB
0003-Refactor-tuple-routing-setup-code.patch text/x-diff 10.6 KB
0004-Fix-a-bug-of-insertion-into-an-internal-partition.patch text/x-diff 8.6 KB
0005-Fix-oddities-of-tuple-routing-and-TupleTableSlots.patch text/x-diff 5.3 KB
0006-Make-ExecConstraints-emit-the-correct-row-in-error-m.patch text/x-diff 11.8 KB
0007-Add-some-tests-for-recent-fixes-to-PartitionDispatch.patch text/x-diff 5.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2016-12-16 08:04:50 Re: Quorum commit for multiple synchronous replication.
Previous Message Michael Paquier 2016-12-16 06:40:08 Re: pg_basebackups and slots