Re: transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
Date: 2017-05-19 05:51:40
Message-ID: 1fedf770-4ef3-08ee-267b-c4b602974e65@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/05/19 14:01, Thomas Munro wrote:
> On Fri, May 19, 2017 at 1:38 AM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
>> On Thu, May 18, 2017 at 5:16 AM, Amit Langote
>> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>
>>> Do we need to update documentation? Perhaps, some clarification on the
>>> inheritance/partitioning behavior somewhere.
>>
>> Yeah, I think so.
>
> Here is an attempt at documenting the situation in the CREATE TRIGGER
> notes section.

Looks good, thanks.

>>> I'm not sure if it's significant for transition tables, but what if a
>>> partition's BR trigger modified the tuple? Would we want to include the
>>> modified version of the tuple in the transition table or the original as
>>> the patch does? Same for the code in CopyFrom().
>>
>> Good spot! If the BR trigger on the child table modifies or
>> suppresses the action, I strongly feel that must be reflected in the
>> transition table. This needs to be fixed.
>
> Gah. Right. In the attached version, there is a still an 'original
> tuple' optimisation for insertions (avoiding parent -> child -> parent
> conversion), but it's disabled if there are any BEFORE INSERT or
> INSTEAD OF INSERT row-level triggers.
>
> That's demonstrated by this part of the regression test, which
> modifies the value inserted into the 'CCC' partition (and similar case
> for COPY):
>
> insert into parent values ('AAA', 42), ('BBB', 42), ('CCC', 66);
> NOTICE: trigger = parent_stmt_trig, old table = <NULL>, new table =
> (AAA,42), (BBB,42), (CCC,1066)

Seems to work correctly.

I saw in the latest patch that now ExecSetupTriggerTransitionState() looks
at mtstate->mt_partition_dispatch_info when setting up the transition
conversion map. In the case where it's non-NULL, you may have realized
that mt_transition_tupconv_map will be an exact copy of
mt_partition_tupconv_maps that's already built. Would it perhaps be a
good idea to either share the same or make a copy using memcpy() instead
of doing the convert_tuples_by_name() calls again?

> On Thu, May 18, 2017 at 10:16 PM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> +typedef struct TriggerTransitionState
>> +{
>> ...
>> + bool ttf_delete_old_table;
>>
>> Just curious: why ttf_? TriggerTransition field?
>
> Oops. Changed to "tts_". I had renamed this struct but not the members.

Ah. BTW, maybe it's not a problem, but the existing TupleTableSlot's
member names are prefixed with tts_, too. :)

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-05-19 06:01:24 Re: Race conditions with WAL sender PID lookups
Previous Message Masahiko Sawada 2017-05-19 05:01:56 Multiple table synchronizations are processed serially