Re: Triggers on foreign tables

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>
Cc: PgHacker <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Subject: Re: Triggers on foreign tables
Date: 2014-03-03 14:10:30
Message-ID: CADyhKSXci64a3xupcOv35dMwjj5ZKbLqUDy7HJ-13MFYretxAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I tried to check the latest (v8) patch again, then could not find
problem in your design change from the v7.
As Noah pointed out, it uses per query-depth tuplestore being released
on AfterTriggerEndSubXact.

So, may I mark it as "ready for committer"?

2014-03-03 15:48 GMT+09:00 Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>:
> Hello.
>
> Did you have time to review the latest version of this patch ? Is there
> anything I can do to get this "ready for commiter" ?
>
> Thank you for all the work performed so far.
>
>
>
>
> Le mardi 4 février 2014 13:16:22 Ronan Dunklau a écrit :
>> Le lundi 3 février 2014 23:28:45 Noah Misch a écrit :
>> > On Sun, Feb 02, 2014 at 11:53:51AM +0100, Ronan Dunklau wrote:
>> > > Le jeudi 30 janvier 2014 14:05:08 Noah Misch a écrit :
>> > > > On Thu, Jan 23, 2014 at 03:17:35PM +0100, Ronan Dunklau wrote:
>> > > > > What do you think about this approach ? Is there something I missed
>> > > > > which
>> > > > > would make it not sustainable ?
>> > > >
>> > > > Seems basically reasonable. I foresee multiple advantages from having
>> > > > one
>> > > > tuplestore per query level as opposed to one for the entire
>> > > > transaction.
>> > > > You would remove the performance trap of backing up the tuplestore by
>> > > > rescanning. It permits reclaiming memory and disk space in
>> > > > AfterTriggerEndQuery() rather than at end of transaction. You could
>> > > > remove
>> > > > ate_ptr1 and ate_ptr2 from AfterTriggerEventDataFDW and just store the
>> > > > flags word: depending on AFTER_TRIGGER_2CTIDS, grab either the next
>> > > > one
>> > > > or
>> > > > the next two tuples from the tuplestore. Using work_mem per
>> > > > AfterTriggerBeginQuery() instead of per transaction is no problem.
>> > > > What
>> > > > do
>> > > > you think of that design change?
>> > >
>> > > I agree that this design is better, but I have some objections.
>> > >
>> > > We can remove ate_ptr2 and rely on the AFTER_TRIGGER_2CTIDS flag, but
>> > > the
>> > > rescanning and ate_ptr1 (renamed ate_tupleindex in the attached patch)
>> > > can't go away.
>> > >
>> > > Consider for example the case of a foreign table with more than one
>> > > AFTER
>> > > UPDATE triggers. Unless we store the tuples once for each trigger, we
>> > > will
>> > > have to rescan the tuplestore.
>> >
>> > Will we? Within a given query level, when do (non-deferred) triggers
>> > execute in an order other than the enqueue order?
>>
>> Let me explain what I had in mind.
>>
>> Looking at the code in AfterTriggerSaveEvent:
>>
>> - we build a "template" AfterTriggerEvent, and store the tuple(s)
>> - for each suitable after trigger that matches the trigger type, as well as
>> the WHEN condition if any, a copy of the previously built AfterTriggerEvent
>> is queued
>>
>> Later, those events are fired in order.
>>
>> This means that more than one event can be fired for one tuple.
>>
>> Take this example:
>>
>> CREATE TRIGGER trig_row_after1
>> AFTER UPDATE ON rem2
>> FOR EACH ROW
>> WHEN (NEW.f1 % 5 < 3)
>> EXECUTE PROCEDURE trigger_func('TRIG1');
>>
>> CREATE TRIGGER trig_row_after2
>> AFTER UPDATE ON rem2
>> FOR EACH ROW
>> WHEN (NEW.f1 % 5 < 4)
>> EXECUTE PROCEDURE trigger_func('TRIG2');
>>
>> UPDATE rem2 set f2 = 'something';
>>
>> Assuming 5 rows with f1 as a serial, the fired AfterTriggerEvent's
>> ate_tupleindex will be, in that order. Ass
>>
>> 0-0-2-2-4-8-8
>>
>> So, at least a backward seek is required for trig_row_after2 to be able to
>> retrieve a tuple that was already consumed when firing trig_row_after1.
>>
>> On a side note, this made me realize that it is better to avoid storing a
>> tuple entirely if there is no enabled trigger (the f1 = 4 case above). The
>> attached patch does that, so the previous sequence becomes:
>>
>> 0-0-2-2-4-6-6
>>
>> It also prevents from initalizing a tuplestore at all if its not needed.
>>
>> > > To mitigate the effects of this behaviour, I added the option to perform
>> > > a
>> > > reverse_seek when the looked-up tuple is nearer from the current index
>> > > than
>> > > from the start.
>> >
>> > If there's still a need to seek within the tuplestore, that should get rid
>> > of the O(n^2) effect. I'm hoping that per-query-level tuplestores will
>> > eliminate the need to seek entirely.
>>
>> I think the only case when seeking is still needed is when there are more
>> than one after trigger that need to be fired, since the abovementioned
>> change prevents from seeking to skip tuples.
>>
>> > > > If you do pursue that change, make sure the code still does the right
>> > > > thing
>> > > > when it drops queued entries during subxact abort.
>> > >
>> > > I don't really understand what should be done at that stage. Since
>> > > triggers on foreign tables are not allowed to be deferred, everything
>> > > should be cleaned up at the end of each query, right ? So, there
>> > > shouldn't be any queued entries.
>> >
>> > I suspect that's right. If you haven't looked over
>> > AfterTriggerEndSubXact(), please do so and ensure all its actions still
>> > make sense in the context of this new kind of trigger storage.
>>
>> You're right, I missed something here. When aborting a subxact, the
>> tuplestores for queries below the subxact query depth should be cleaned, if
>> any, because AfterTriggerEndQuery has not been called for the failing query.
>>
>> The attached patch fixes that.
>>
>> > > > > The attached patch checks this, and add documentation for this
>> > > > > limitation.
>> > > > > I'm not really sure about how to phrase that correctly in the error
>> > > > > message
>> > > > > and the documentation. One can store at most INT_MAX foreign tuples,
>> > > > > which
>> > > > > means that at most INT_MAX insert or delete or "half-updates" can
>> > > > > occur.
>> > > > > By
>> > > > > half-updates, I mean that for update two tuples are stored whereas
>> > > > > in
>> > > > > contrast to only one for insert and delete trigger.
>> > > > >
>> > > > > Besides, I don't know where this disclaimer should be in the
>> > > > > documentation.
>> > > > > Any advice here ?
>> > > >
>> > > > I wouldn't mention that limitation.
>> > >
>> > > Maybe it shouldn't be so prominent, but I still think a note somewhere
>> > > couldn't hurt.
>> >
>> > Perhaps. There's not much documentation of such implementation upper
>> > limits, and there's no usage of "INT_MAX" outside of parts that discuss
>> > writing C code. I'm not much of a visionary when it comes to the
>> > documentation; I try to document new things with an amount of detail
>> > similar to old features.
>>
>> Ok, I removed the paragraph documenting the limitation.
>>
>> > > Should the use of work_mem be documented somewhere, too ?
>> >
>> > I wouldn't. Most uses of work_mem are undocumented, even relatively major
>> > ones like count(DISTINCT ...) and CTEs. So, while I'd generally favor a
>> > patch documenting all/most of the things that use work_mem, it would be
>> > odd
>> > to document one new consumer apart from the others.
>>
>> Ok.
>>
>> > > > This is the performance trap I mentioned above; it brings potential
>> > > > O(n^2)
>> > > > complexity to certain AFTER trigger execution scenarios.
>> > >
>> > > What scenarios do you have in mind ? I "fixed" the problem when there
>> > > are
>> > > multiple triggers on a foreign table, do you have any other one ?
>> >
>> > I'm not aware of such a performance trap in your latest design.
>>
>> Good !
>>
>> > Thanks,
>> > nm
>
> --
> Ronan Dunklau
> http://dalibo.com - http://dalibo.org

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-03-03 14:27:05 Re: Performance Improvement by reducing WAL for Update Operation
Previous Message Andres Freund 2014-03-03 13:33:38 Re: heapgetpage() and ->takenDuringRecovery