Re: repeated decoding of prepared transactions

From: Markus Wanner <markus(dot)wanner(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, simon(dot)riggs(at)enterprisedb(dot)com, Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
Subject: Re: repeated decoding of prepared transactions
Date: 2021-02-20 10:55:19
Message-ID: 21251661-f342-a2e1-05bc-77945d476562@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20.02.21 04:38, Amit Kapila wrote:
> I see a problem with this assumption. During the initial
> synchronization, this transaction won't be visible to snapshot and we
> won't copy it. Then later if we won't decode and send it then the
> replica will be out of sync. Such a problem won't happen with Ajin's
> patch.

You are assuming that the initial snapshot is a) logical and b) dumb.

A physical snapshot very well "sees" prepared transactions and will
restore them to their prepared state. But even in the logical case, I
think it's beneficial to keep the decoder simpler and instead require
some support for two-phase commit in the initial synchronization logic.
For example using the following approach (you will recognize
similarities to what snapbuild does):

1.) create the slot
2.) start to retrieve changes and queue them
3.) wait for the prepared transactions that were pending at the
point in time of step 1 to complete
4.) take a snapshot (by visibility, w/o requiring to "see" prepared
transactions)
5.) apply the snapshot
6.) replay the queue, filtering commits already visible in the
snapshot

Just as with the solution proposed by Ajin and you, this has the danger
of showing transactions as committed without the effects of the PREPAREs
being "visible" (after step 5 but before 6).

However, this approach of solving the problem outside of the walsender
has two advantages:

* The delay in step 3 can be made visible and dealt with. As there's
no upper boundary to that delay, it makes sense to e.g. inform the
user after 10 minutes and provide a list of two-phase transactions
still in progress.

* Second, it becomes possible to avoid inconsistencies during the
reconciliation window in between steps 5 and 6 by disallowing
concurrent (user) transactions to run until after completion of
step 6.

Whereas the current implementation hides this in the walsender without
any way to determine how much a PREPARE had been delayed or when
consistency has been reached. (Of course, short of using the very same
initial snapshotting approach outlined above. For which the reordering
logic in the walsender does more harm than good.)

Essentially, I think I'm saying that while I agree that some kind of
snapshot synchronization logic is needed, it should live in a different
place.

Regards

Markus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-20 11:15:39 Re: [PATCH] Present all committed transaction to the output plugin
Previous Message Dilip Kumar 2021-02-20 10:46:48 Re: [HACKERS] Custom compression methods