Re: logical decoding of two-phase transactions

From: Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>
To: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: logical decoding of two-phase transactions
Date: 2017-10-27 19:53:08
Message-ID: 0e46521cbbca66c3298d1a9d672c9af0@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 2017-10-26 22:01, Sokolov Yura wrote:
> On 2017-09-27 14:46, Stas Kelvich wrote:
>>> On 7 Sep 2017, at 18:58, Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> FYI all, wanted to mention that I am working on an updated version of
>>> the latest patch that I plan to submit to a later CF.
>>>
>>
>> Cool!
>>
>> So what kind of architecture do you have in mind? Same way as is it
>> was implemented before?
>> As far as I remember there were two main issues:
>>
>> * Decodong of aborted prepared transaction.
>>
>> If such transaction modified catalog then we can’t read reliable info
>> with our historic snapshot,
>> since clog already have aborted bit for our tx it will brake
>> visibility logic. There are some way to
>> deal with that — by doing catalog seq scan two times and counting
>> number of tuples (details
>> upthread) or by hijacking clog values in historic visibility function.
>> But ISTM it is better not solve this
>> issue at all =) In most cases intended usage of decoding of 2PC
>> transaction is to do some form
>> of distributed commit, so naturally decoding will happens only with
>> in-progress transactions and
>> we commit/abort will happen only after it is decoded, sent and
>> response is received. So we can
>> just have atomic flag that prevents commit/abort of tx currently being
>> decoded. And we can filter
>> interesting prepared transactions based on GID, to prevent holding
>> this lock for ordinary 2pc.
>>
>> * Possible deadlocks that Andres was talking about.
>>
>> I spend some time trying to find that, but didn’t find any. If locking
>> pg_class in prepared tx is the only
>> example then (imho) it is better to just forbid to prepare such
>> transactions. Otherwise if some realistic
>> examples that can block decoding are actually exist, then we probably
>> need to reconsider the way
>> tx being decoded. Anyway this part probably need Andres blessing.
>
> Just rebased patch logical_twophase_v6 to master.
>
> Fixed small issues:
> - XactLogAbortRecord wrote DBINFO twice, but it was decoded in
> ParseAbortRecord only once. Second DBINFO were parsed as ORIGIN.
> Fixed by removing second write of DBINFO.
> - SnapBuildPrepareTxnFinish tried to remove xid from `running` instead
> of `committed`. And it removed only xid, without subxids.
> - test_decoding skipped returning "COMMIT PREPARED" and "ABORT
> PREPARED",
>
> Big issue were with decoding ddl-including two-phase transactions:
> - prepared.out were misleading. We could not reproduce decoding body of
> "test_prepared#3" with logical_twophase_v6.diff. It was skipped if
> `pg_logical_slot_get_changes` were called without
> `twophase-decode-with-catalog-changes` set, and only "COMMIT PREPARED
> test_prepared#3" were decoded.
> The reason is "PREPARE TRANSACTION" is passed to `pg_filter_prepare`
> twice:
> - first on "PREPARE" itself,
> - second - on "COMMIT PREPARED".
> In v6, `pg_filter_prepare` without `with-catalog-changes` first time
> answered "true" (ie it should not be decoded), and second time (when
> transaction became committed) it answered "false" (ie it should be
> decoded). But second time in DecodePrepare
> `ctx->snapshot_builder->start_decoding_at`
> is already in a future compared to `buf->origptr` (because it is
> on "COMMIT PREPARED" lsn). Therefore DecodePrepare just called
> ReorderBufferForget.
> If `pg_filter_prepare` is called with `with-catalog-changes`, then
> it returns "false" both times, thus DeocdePrepare decodes transaction
> in first time, and calls `ReorderBufferForget` in second time.
>
> I didn't found a way to fix it gracefully. I just change
> `pg_filter_prepare`
> to return same answer both times: "false" if called
> `with-catalog-changes`
> (ie need to call DecodePrepare), and "true" otherwise. With this
> change, catalog changing two-phase transaction is decoded as simple
> one-phase transaction, if `pg_logical_slot_get_changes` is called
> without `with-catalog-changes`.

Small improvement compared to v7:
- twophase_gid is written with alignment padding in the
XactLogCommitRecord
and XactLogAbortRecord.

--
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

Attachment Content-Type Size
logical_twophase_v8.diff.gz application/x-gzip 14.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-10-27 20:06:58 Re: WIP: BRIN bloom indexes
Previous Message Sokolov Yura 2017-10-27 19:27:43 Re: WIP: BRIN bloom indexes