Re: repeated decoding of prepared transactions

From: Markus Wanner <markus(dot)wanner(at)enterprisedb(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, robertmhaas(at)gmail(dot)com, simon(dot)riggs(at)enterprisedb(dot)com, andres(at)anarazel(dot)de, petr(dot)jelinek(at)enterprisedb(dot)com
Subject: Re: repeated decoding of prepared transactions
Date: 2021-02-19 14:53:32
Message-ID: 415799ff-89bb-a78e-2f79-7f29834d0460@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ajin, Amit,

thank you both a lot for thinking this through and even providing a patch.

The changes in expectation for twophase.out matches exactly with what I
prepared. And the switch with pg_logical_slot_get_changes indeed is
something I had not yet considered, either.

On 19.02.21 03:50, Ajin Cherian wrote:
> For this, I am planning to change the semantics such that
> two-phase-commit can only be specified while creating the slot using
> pg_create_logical_replication_slot()
> and not in pg_logical_slot_get_changes, thus preventing
> two-phase-commit flag from being toggled between restarts of the
> decoder. Let me know if anybody objects to this
> change, else I will update that in the next patch.

This sounds like a good plan to me, yes.

However, more generally speaking, I suspect you are overthinking this.
All of the complexity arises because of the assumption that an output
plugin receiving and confirming a PREPARE may not be able to persist
that first phase of transaction application. Instead, you are trying to
somehow resurrect the transactional changes and the prepare at COMMIT
PREPARED time and decode it in a deferred way.

Instead, I'm arguing that a PREPARE is an atomic operation just like a
transaction's COMMIT. The decoder should always feed these in the order
of appearance in the WAL. For example, if you have PREAPRE A, COMMIT B,
COMMIT PREPARED A in the WAL, the decoder should always output these
events in exactly that order. And not ever COMMIT B, PREPARE A, COMMIT
PREPARED A (which is currently violated in the expectation for
twophase_snapshot, because the COMMIT for `s1insert` there appears after
the PREPARE of `s2p` in the WAL, but gets decoded before it).

The patch I'm attaching corrects this expectation in twophase_snapshot,
adds an explanatory diagram, and eliminates any danger of sending
PREPAREs at COMMIT PREPARED time. Thereby preserving the ordering of
PREPAREs vs COMMITs.

Given the output plugin supports two-phase commit, I argue there must be
a good reason for it setting the start_decoding_at LSN to a point in
time after a PREPARE. To me that means the output plugin (or its
downstream replica) has processed the PREPARE (and the downstream
replica did whatever it needed to do on its side in order to make the
transaction ready to be committed in a second phase).

(In the weird case of an output plugin that wants to enable two-phase
commit but does not really support it downstream, it's still possible
for it to hold back LSN confirmations for prepared-but-still-in-flight
transactions. However, I'm having a hard time justifying this use case.)

With that line of thinking, the point in time (or in WAL) of the COMMIT
PREPARED does not matter at all to reason about the decoding of the
PREPARE operation. Instead, there are only exactly two cases to consider:

a) the PREPARE happened before the start_decoding_at LSN and must not be
decoded. (But the effects of the PREPARE must then be included in the
initial synchronization. If that's not supported, the output plugin
should not enable two-phase commit.)

b) the PREPARE happens after the start_decoding_at LSN and must be
decoded. (It obviously is not included in the initial synchronization
or decoded by a previous instance of the decoder process.)

The case where the PREPARE lies before SNAPBUILD_CONSISTENT must always
be case a) where we must not repeat the PREPARE, anyway. And in case b)
where we need a consistent snapshot to decode the PREPARE, existing
provisions already guarantee that to be possible (or how would this be
different from a regular single-phase commit?).

Please let me know what you think and whether this approach is feasible
for you as well.

Regards

Markus

Attachment Content-Type Size
0001-Preserve-ordering-of-PREPAREs-vs-COMMITs.patch text/x-patch 16.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Seamus Abshere 2021-02-19 14:53:46 Re: A reloption for partitioned tables - parallel_workers
Previous Message Jonah H. Harris 2021-02-19 14:37:26 Re: Extensibility of the PostgreSQL wire protocol