Re: logical decoding of two-phase transactions

From: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
To: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical decoding of two-phase transactions
Date: 2017-03-17 00:10:27
Message-ID: EEBD82AA-61EE-46F4-845E-05B94168E8F2@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> On 2 Mar 2017, at 11:00, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>>
>> BTW, I've been reviewing the patch in more detail. Other than a bunch
>> of copy-and-paste that I'm cleaning up, the main issue I've found is
>> that in DecodePrepare, you call:
>>
>> SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
>> parsed->nsubxacts, parsed->subxacts);
>>
>> but I am not convinced it is correct to call it at PREPARE TRANSACTION
>> time, only at COMMIT PREPARED time. We want to see the 2pc prepared
>> xact's state when decoding it, but there might be later commits that
>> cannot yet see that state and shouldn't have it visible in their
>> snapshots.
>
> Agree, that is problem. That allows to decode this PREPARE, but after that
> it is better to mark this transaction as running in snapshot or perform prepare
> decoding with some kind of copied-end-edited snapshot. I’ll have a look at this.
>

While working on this i’ve spotted quite a nasty corner case with aborted prepared
transaction. I have some not that great ideas how to fix it, but maybe i blurred my
view and missed something. So want to ask here at first.

Suppose we created a table, then in 2pc tx we are altering it and after that aborting tx.
So pg_class will have something like this:

xmin | xmax | relname
100 | 200 | mytable
200 | 0 | mytable

After previous abort, tuple (100,200,mytable) becomes visible and if we will alter table
again then xmax of first tuple will be set current xid, resulting in following table:

xmin | xmax | relname
100 | 300 | mytable
200 | 0 | mytable
300 | 0 | mytable

In that moment we’ve lost information that first tuple was deleted by our prepared tx.
And from POV of historic snapshot that will be constructed to decode prepare first
tuple is visible, but actually send tuple should be used. Moreover such snapshot could
see both tuples violating oid uniqueness, but heapscan stops after finding first one.

I see here two possible workarounds:

* Try at first to scan catalog filtering out tuples with xmax bigger than snapshot->xmax
as it was possibly deleted by our tx. Than if nothing found scan in a usual way.

* Do not decode such transaction at all. If by the time of decoding prepare record we
already know that it is aborted than such decoding doesn’t have a lot of sense.
IMO intended usage of logical 2pc decoding is to decide about commit/abort based
on answers from logical subscribers/replicas. So there will be barrier between
prepare and commit/abort and such situations shouldn’t happen.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2017-03-17 00:18:51 Re: Crash on promotion when recovery.conf is renamed
Previous Message Andreas Karlsson 2017-03-16 23:28:48 Re: \h tab-completion