Re: repeated decoding of prepared transactions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: repeated decoding of prepared transactions
Date: 2021-02-22 04:09:21
Message-ID: 20210222040921.c5ghlj4r4mjvcx7d@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-02-22 08:22:35 +0530, Amit Kapila wrote:
> On Mon, Feb 22, 2021 at 3:56 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > On 2021-02-21 11:32:29 +0530, Amit Kapila wrote:
> > > Here, I am assuming you are asking to disable 2PC both via
> > > apply-worker and tablesync worker till the initial sync (aka all
> > > tables are in SUBREL_STATE_READY state) phase is complete. If we do
> > > that and what if commit prepared happened after the initial sync phase
> > > but prepare happened before that?
> >
> > Isn't that pretty easy to detect? You compare the LSN of the tx prepare
> > with the LSN of achieving consistency?
> >
>
> I think by LSN of achieving consistency, you mean start_decoding_at
> LSN.

Kinda, but not in the way you suggest. I mean the LSN at which the slot
reached SNAPBUILD_CONSISTENT. Which also is the point in the WAL stream
we exported the initial snapshot for.

My understanding of why you need to have special handling of 2pc PREPARE
is that the initial snapshot will not contain the contents of the
prepared transaction, therefore you need to send it out at some point
(or be incorrect).

Your solution to this is:
/*
* It is possible that this transaction is not decoded at prepare time
* either because by that time we didn't have a consistent snapshot or it
* was decoded earlier but we have restarted. We can't distinguish between
* those two cases so we send the prepare in both the cases and let
* downstream decide whether to process or skip it. We don't need to
* decode the xact for aborts if it is not done already.
*/
if (!rbtxn_prepared(txn) && is_commit)

but IMO this violates a pretty fundamental tenant of how logical
decoding is supposed to work, i.e. that data that the client
acknowledges as having received (via lsn passed to START_REPLICATION)
shouldn't be sent out again.

What I am proposing is to instead track the point at which the slot
gained consistency - a simple LSN. That way you can change the above
logic to instead be

if (txn->final_lsn > snapshot_was_exported_at_lsn)
ReorderBufferReplay();
else
...

That will easily work across restarts, won't lead to sending data twice,
etc.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-02-22 04:22:43 Re: repeated decoding of prepared transactions
Previous Message Andres Freund 2021-02-22 03:34:47 Re: Finding cause of test fails on the cfbot site