Re: logical decoding of two-phase transactions

From: Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>
To: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: logical decoding of two-phase transactions
Date: 2017-10-26 19:01:40
Message-ID: cd6a59942f42c37adc19eae23dfe701c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-09-27 14:46, Stas Kelvich wrote:
>> On 7 Sep 2017, at 18:58, Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
>> wrote:
>>
>> Hi,
>>
>> FYI all, wanted to mention that I am working on an updated version of
>> the latest patch that I plan to submit to a later CF.
>>
>
> Cool!
>
> So what kind of architecture do you have in mind? Same way as is it
> was implemented before?
> As far as I remember there were two main issues:
>
> * Decodong of aborted prepared transaction.
>
> If such transaction modified catalog then we can’t read reliable info
> with our historic snapshot,
> since clog already have aborted bit for our tx it will brake
> visibility logic. There are some way to
> deal with that — by doing catalog seq scan two times and counting
> number of tuples (details
> upthread) or by hijacking clog values in historic visibility function.
> But ISTM it is better not solve this
> issue at all =) In most cases intended usage of decoding of 2PC
> transaction is to do some form
> of distributed commit, so naturally decoding will happens only with
> in-progress transactions and
> we commit/abort will happen only after it is decoded, sent and
> response is received. So we can
> just have atomic flag that prevents commit/abort of tx currently being
> decoded. And we can filter
> interesting prepared transactions based on GID, to prevent holding
> this lock for ordinary 2pc.
>
> * Possible deadlocks that Andres was talking about.
>
> I spend some time trying to find that, but didn’t find any. If locking
> pg_class in prepared tx is the only
> example then (imho) it is better to just forbid to prepare such
> transactions. Otherwise if some realistic
> examples that can block decoding are actually exist, then we probably
> need to reconsider the way
> tx being decoded. Anyway this part probably need Andres blessing.

Just rebased patch logical_twophase_v6 to master.

Fixed small issues:
- XactLogAbortRecord wrote DBINFO twice, but it was decoded in
ParseAbortRecord only once. Second DBINFO were parsed as ORIGIN.
Fixed by removing second write of DBINFO.
- SnapBuildPrepareTxnFinish tried to remove xid from `running` instead
of `committed`. And it removed only xid, without subxids.
- test_decoding skipped returning "COMMIT PREPARED" and "ABORT
PREPARED",

Big issue were with decoding ddl-including two-phase transactions:
- prepared.out were misleading. We could not reproduce decoding body of
"test_prepared#3" with logical_twophase_v6.diff. It was skipped if
`pg_logical_slot_get_changes` were called without
`twophase-decode-with-catalog-changes` set, and only "COMMIT PREPARED
test_prepared#3" were decoded.
The reason is "PREPARE TRANSACTION" is passed to `pg_filter_prepare`
twice:
- first on "PREPARE" itself,
- second - on "COMMIT PREPARED".
In v6, `pg_filter_prepare` without `with-catalog-changes` first time
answered "true" (ie it should not be decoded), and second time (when
transaction became committed) it answered "false" (ie it should be
decoded). But second time in DecodePrepare
`ctx->snapshot_builder->start_decoding_at`
is already in a future compared to `buf->origptr` (because it is
on "COMMIT PREPARED" lsn). Therefore DecodePrepare just called
ReorderBufferForget.
If `pg_filter_prepare` is called with `with-catalog-changes`, then
it returns "false" both times, thus DeocdePrepare decodes transaction
in first time, and calls `ReorderBufferForget` in second time.

I didn't found a way to fix it gracefully. I just change
`pg_filter_prepare`
to return same answer both times: "false" if called
`with-catalog-changes`
(ie need to call DecodePrepare), and "true" otherwise. With this
change, catalog changing two-phase transaction is decoded as simple
one-phase transaction, if `pg_logical_slot_get_changes` is called
without `with-catalog-changes`.

--
With regards,
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

Attachment Content-Type Size
logical_twophase_v7.diff.gz application/x-gzip 14.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2017-10-26 19:19:18 Re: Removing [Merge]Append nodes which contain a single subpath
Previous Message Michael Paquier 2017-10-26 18:13:55 Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?