| From: | Simon Riggs <simon(at)2ndquadrant(dot)com> | 
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Cc: | Craig Ringer <craig(at)2ndquadrant(dot)com>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: logical decoding of two-phase transactions | 
| Date: | 2017-03-20 12:55:09 | 
| Message-ID: | CANP8+j+LoD8JiDL-jNtUGV87M3am6Vwqz0=ds9b5FkyqdXu57w@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 17 March 2017 at 23:59, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Mar 16, 2017 at 10:34 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> On 17 March 2017 at 08:10, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> wrote:
>>> While working on this i’ve spotted quite a nasty corner case with aborted prepared
>>> transaction. I have some not that great ideas how to fix it, but maybe i blurred my
>>> view and missed something. So want to ask here at first.
>>>
>>> Suppose we created a table, then in 2pc tx we are altering it and after that aborting tx.
>>> So pg_class will have something like this:
>>>
>>> xmin | xmax | relname
>>> 100  | 200    | mytable
>>> 200  | 0        | mytable
>>>
>>> After previous abort, tuple (100,200,mytable) becomes visible and if we will alter table
>>> again then xmax of first tuple will be set current xid, resulting in following table:
>>>
>>> xmin | xmax | relname
>>> 100  | 300    | mytable
>>> 200  | 0        | mytable
>>> 300  | 0        | mytable
>>>
>>> In that moment we’ve lost information that first tuple was deleted by our prepared tx.
>>
>> Right. And while the prepared xact has aborted, we don't control when
>> it aborts and when those overwrites can start happening. We can and
>> should check if a 2pc xact is aborted before we start decoding it so
>> we can skip decoding it if it's already aborted, but it could be
>> aborted *while* we're decoding it, then have data needed for its
>> snapshot clobbered.
>>
>> This hasn't mattered in the past because prepared xacts (and
>> especially aborted 2pc xacts) have never needed snapshots, we've never
>> needed to do something from the perspective of a prepared xact.
>>
>> I think we'll probably need to lock the 2PC xact so it cannot be
>> aborted or committed while we're decoding it, until we finish decoding
>> it. So we lock it, then check if it's already aborted/already
>> committed/in progress. If it's aborted, treat it like any normal
>> aborted xact. If it's committed, treat it like any normal committed
>> xact. If it's in progress, keep the lock and decode it.
>
> But that lock could need to be held for an unbounded period of time -
> as long as decoding takes to complete - which seems pretty
> undesirable.
This didn't seem to be too much of a problem when I read it.
Sure, the issue noted by Stas exists, but it requires
Alter-Abort-Alter for it to be a problem. Meaning that normal non-DDL
transactions do not have problems. Neither would a real-time system
that uses the decoded data to decide whether to commit or abort the
transaction; in that case there would never be an abort until after
decoding.
So I suggest we have a pre-prepare callback to ensure that the plugin
can decide whether to decode or not. We can pass information to the
plugin such as whether we have issued DDL in that xact or not. The
plugin can then decide how it wishes to handle it, so if somebody
doesn't like the idea of a lock then don't use one. The plugin is
already responsible for many things, so this is nothing new.
-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Stas Kelvich | 2017-03-20 12:57:07 | Re: logical decoding of two-phase transactions | 
| Previous Message | Stephen Frost | 2017-03-20 12:33:02 | Re: [COMMITTERS] pgsql: Improve pg_dump regression tests and code coverage |