Re: repeated decoding of prepared transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Markus Wanner <markus(dot)wanner(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: repeated decoding of prepared transactions
Date: 2021-02-16 04:13:15
Message-ID: CAA4eK1JLVfB9hiczRyTt6qLmw90qR3-9ZzeZnHi-52nEw5_SYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 11, 2021 at 4:06 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Feb 8, 2021 at 2:01 PM Markus Wanner
> <markus(dot)wanner(at)enterprisedb(dot)com> wrote:
> >
> Now, coming back to the restart case where the prepared transaction
> can be sent again by the publisher. I understand yours and others
> point that we should not send prepared transaction if there is a
> restart between prepare and commit but there are reasons why we have
> done that way and I am open to your suggestions. I'll once again try
> to explain the exact case to you which is not very apparent. The basic
> idea is that we ship/replay all transactions where commit happens
> after the snapshot has a consistent state (SNAPBUILD_CONSISTENT), see
> atop snapbuild.c for details. Now, for transactions where prepare is
> before snapshot state SNAPBUILD_CONSISTENT and commit prepared is
> after SNAPBUILD_CONSISTENT, we need to send the entire transaction
> including prepare at the commit time. One might think it is quite easy
> to detect that, basically if we skip prepare when the snapshot state
> was not SNAPBUILD_CONSISTENT, then mark a flag in ReorderBufferTxn and
> use the same to detect during commit and accordingly take the decision
> to send prepare but unfortunately it is not that easy. There is always
> a chance that on restart we reuse the snapshot serialized by some
> other Walsender at a location prior to Prepare and if that happens
> then this time the prepare won't be skipped due to snapshot state
> (SNAPBUILD_CONSISTENT) but due to start_decodint_at point (considering
> we have already shipped some of the later commits but not prepare).
> Now, this will actually become the same situation where the restart
> has happened after we have sent the prepare but not commit. This is
> the reason we have to resend the prepare when the subscriber restarts
> between prepare and commit.
>

After further thinking on this problem and some off-list discussions
with Ajin, there appears to be another way to solve the above problem
by which we can avoid resending the prepare after restart if it has
already been processed by the subscriber. The main reason why we were
not able to distinguish between the two cases ((a) prepare happened
before SNAPBUILD_CONSISTENT state but commit prepared happened after
we reach SNAPBUILD_CONSISTENT state and (b) prepare is already
decoded, successfully processed by the subscriber and we have
restarted the decoding) is that we can re-use the serialized snapshot
at LSN location prior to Prepare of some concurrent WALSender after
the restart. Now, if we ensure that we don't use serialized snapshots
for decoding via slots where two_phase decoding option is enabled then
we won't have that problem. The drawback is that in some cases it can
take a bit more time for initial snapshot building but maybe that is
better than the current solution.

Any suggestions?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2021-02-16 05:39:57 RE: [POC] Fast COPY FROM command for the table with foreign partitions
Previous Message David Rowley 2021-02-16 04:00:50 Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)