Re: PG10 transition tables, wCTEs and multiple operations on the same table

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(at)joh(dot)to>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PG10 transition tables, wCTEs and multiple operations on the same table
Date: 2017-06-07 05:36:26
Message-ID: CAEepm=1K7F08QPu+kGcNQ-bCcYBa3QX=9AeE=j0doCmgqVs4Tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 7, 2017 at 9:42 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think I'd like to walk back my earlier statements about reverting
> this patch just a little bit. Although putting the tuplestore at the
> wrong level does seem like a fairly significant design mistake, Thomas
> more or less convinced me yesterday on a Skype call that relocating it
> to the ModifyTable node might not be that much work. If it's a
> 150-line patch, it'd probably be less disruptive to install a fix than
> to revert the whole thing (and maybe put it back in again, in some
> future release).

I spent a couple of hours drafting a proof-of-concept to see if my
hunch was right. It seems to work correctly so far and isn't huge
(but certainly needs more testing and work):

6 files changed, 156 insertions(+), 109 deletions(-)

It applies on top of the other patch[1]. It extends
TransitionCaptureState to hold the the new and old tuplestores for
each ModifyTable node, instead of using global variables. The result
is that tuplestores don't get mixed up, and there aren't any weird
assumptions about the order of execution as discussed earlier.
Example:

with wcte as (insert into table1 values (42))
insert into table2 values ('hello world');
NOTICE: trigger = table2_trig, old table = <NULL>, new table =
("hello world")
NOTICE: trigger = table1_trig, old table = <NULL>, new table = (42)

Summary of how these patches relate:

1. In the inheritance patch[1], TransitionCaptureState is introduced.
It holds flags that control whether we capture tuples. There is one
of these per ModifyTable node. In master we use the flags in
TriggerDesc to control transition tuple capture directly, but we
needed a way for ModifyTable's result rel's TriggerDesc to affect all
child tables that are touched. My proposal is to do that by inventing
this new object to activate transition tuple capture while modifying
child tables too. It is passed into the ExecAR*Trigger() functions of
all relations touched by the ModifyTable node.

2. In the attached patch, that struct is extended to hold the actual
tuplestores. They are used for two purposes: ExecAR*Trigger()
captures tuples into them (instead of using global variables to find
the tuplestores to capture tuples into), and ExecA[RS]*Trigger() keeps
hold of the TransitionCaptureState in the after trigger queue so that
when the queued event is eventually executed AfterTriggerExecute() can
expose the correct tuplestores to triggers.

There are a couple of things that definitely need work and I'd welcome
any comments:

1. I added a pointer to TransitionCaptureState to AfterTriggerShared,
in order to record which tuplestores a queued after trigger event
should see. I suspected that enqueuing pointers like that wouldn't be
popular, and when I ran the idea past Andres on IRC he just said
"yuck" :-) Perhaps there needs to be a way to convert this into an
index into some array in EState, ... or something else. The basic
requirement is that the AfterTriggerExecute() needs to know *which*
tuplestores should be visible to the trigger when it runs. I believe
the object lifetime is sound (the TransitionCaptureState lasts until
ExecutorEnd(), and triggers are fired before that during
ExecutorFinish()).

2. I didn't think about what execReplication.c needs. Although that
code apparently doesn't know how to fire AS triggers, it does know how
to fire AR triggers (so that RI works?), and in theory those might
have transition tables, so I guess that needs to use
MakeTransitionCaptureState() -- but it seems to lack a place to keep
that around, and I ran out of time thinking about that today.

Thoughts?

[1] https://www.postgresql.org/message-id/CAEepm%3D1dGNzh98Gt21fn_Ed6k20sVB-NuAARE1EF693itK6%3DLg%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
transition-tuples-from-wctes-v1.patch application/octet-stream 22.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-06-07 05:37:19 Re: tap tests on older branches fail if concurrency is used
Previous Message Andres Freund 2017-06-07 05:32:43 Re: Notes on testing Postgres 10b1