Re: logical replication empty transactions

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: logical replication empty transactions
Date: 2021-08-16 06:43:51
Message-ID: CAHut+PukQ8yyfSmDCNisVHxSjA632SmzSP_krQg-NJBdZ7gJaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 13, 2021 at 9:01 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > >
> >
> > Let's first split the patch for prepared and non-prepared cases as
> > that will help to focus on each of them separately. BTW, why haven't
> > you considered implementing point 1b as explained by Andres in his
> > email [1]? I think we can send a keepalive message in case of
> > synchronous replication when we skip an empty transaction, otherwise,
> > it might delay in responding to transactions synchronous_commit mode.
> > I think in the tests done in the thread, it might not have been shown
> > because we are already sending keepalives too frequently. But what if
> > someone disables wal_sender_timeout or kept it to a very large value?
> > See WalSndKeepaliveIfNecessary. The other thing you might want to look
> > at is if the reason for frequent keepalives is the same as described
> > in the email [2].
> >
>
> I have tried to address the comment here by modifying the
> ctx->update_progress callback function (WalSndUpdateProgress) provided
> for plugins. I have added an option
> by which the callback can specify if it wants to send keep_alives. And
> when the callback is called with that option set, walsender updates a
> flag force_keep_alive_syncrep.
> The Walsender in the WalSndWaitForWal for loop, checks this flag and
> if synchronous replication is enabled, then sends a keep alive.
> Currently this logic
> is added as an else to the current logic that is already there in
> WalSndWaitForWal, which is probably considered unnecessary and a
> source of the keep alive flood
> that you talked about. So, I can change that according to how that fix
> shapes up there. I have also added an extern function in syncrep.c
> that makes it possible
> for walsender to query if synchronous replication is turned on.
>
> The reason I had to turn on a flag and rely on the WalSndWaitForWal to
> send the keep alive in its next iteration is because I tried doing
> this directly when a
> commit is skipped but it didn't work. The reason for this is that when
> the commit is being decoded the sentptr at the moment is at the commit
> LSN and the keep alive
> will be sent for the commit LSN but the syncrep wait is waiting for
> end_lsn of the transaction which is the next LSN. So, sending a keep
> alive at the moment the
> commit is decoded doesn't seem to solve the problem of the waiting
> synchronous reply.
>
> > Few other miscellaneous comments:
> > 1.
> > static void
> > pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
> > ReorderBufferTXN *txn,
> > - XLogRecPtr commit_lsn)
> > + XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
> > + TimestampTz prepare_time)
> > {
> > + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
> > +
> > OutputPluginUpdateProgress(ctx);
> >
> > + /*
> > + * If the BEGIN PREPARE was not yet sent, then it means there were no
> > + * relevant changes encountered, so we can skip the COMMIT PREPARED
> > + * message too.
> > + */
> > + if (txndata)
> > + {
> > + bool skip = !txndata->sent_begin_txn;
> > + pfree(txndata);
> > + txn->output_plugin_private = NULL;
> >
> > How is this supposed to work after the restart when prepared is sent
> > before the restart and we are just sending commit_prepared after
> > restart? Won't this lead to sending commit_prepared even when the
> > corresponding prepare is not sent? Can we think of a better way to
> > deal with this?
> >
>
> I have tried to resolve this by adding logic in worker,c to silently
> ignore spurious commit_prepareds. But this change required checking if
> the prepare exists on the
> subscriber before attempting the commit_prepared but the current API
> that checks this requires prepare time and transaction end_lsn. But
> for this I had to
> change the protocol of commit_prepared, and I understand that this
> would break backward compatibility between subscriber and publisher
> (you have raised this issue as well).
> I am not sure how else to handle this, let me know if you have any
> other ideas. One option could be to have another API to check if the
> prepare exists on the subscriber with
> the prepared 'gid' alone, without checking prepare_time or end_lsn.
> Let me know if this idea works.
>
> I have left out the patch 0002 for prepared transactions until we
> arrive at a decision on how to address the above issue.
>
> Peter,
> I have also addressed the comments you've raised on patch 0001, please
> have a look and confirm.

I have reviewed the v13-0001 patch.

Apply / build / test was all OK

Below are my code review comments.

//////////

Comments for v13-0001
=====================

1. Patch comment

=>

Probably this comment should include some description for the new
"keepalive" logic as well.

------

2. src/backend/replication/syncrep.c - new function

@@ -330,6 +330,18 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
}

/*
+ * Check if Sync Rep is enabled
+ */
+bool
+SyncRepEnabled(void)
+{
+ if (SyncRepRequested() && ((volatile WalSndCtlData *)
WalSndCtl)->sync_standbys_defined)
+ return true;
+ else
+ return false;
+}
+

2a. Function comment =>

Why abbreviations in the comment? Why not say "synchronous
replication" instead of "Sync Rep".

~~

2b. if/else =>

Remove the if/else. e.g.

return SyncRepRequested() && ((volatile WalSndCtlData *)
WalSndCtl)->sync_standbys_defined;

~~

2c. Call the new function =>

There is some existing similar code in SyncRepWaitForLSN(), e.g.

if (!SyncRepRequested() ||
!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
return;

Now that you have a new function you maybe can call it from here, e.g.

if (!SyncRepEnabled())
return;

------

3. src/backend/replication/walsender.c - whitespace

+ if (send_keep_alive)
+ force_keep_alive_syncrep = true;
+
+

=>

Extra blank line?

------

4. src/backend/replication/walsender.c - call keepalive

if (MyWalSnd->flush < sentPtr &&
MyWalSnd->write < sentPtr &&
!waiting_for_ping_response)
+ {
WalSndKeepalive(false);
+ }
+ else
+ {
+ if (force_keep_alive_syncrep && SyncRepEnabled())
+ WalSndKeepalive(false);
+ }

4a. Move the SynRepEnabled() call =>

I think it is not necessary to call the SynRepEnabled() here. Instead,
it might be better if this is called back when you assign the
force_keep_alive_syncrep flag. So change the WalSndUpdateProgress,
e.g.

BEFORE
if (send_keep_alive)
force_keep_alive_syncrep = true;
AFTER
force_keep_alive_syncrep = send_keep_alive && SyncRepEnabled();

Note: Also, that assignment also deserves a big comment to say what it is doing.

~~

4b. Change the if/else =>

If you make the change for 4a. then perhaps the keepalive if/else is
overkill and could be changed.e.g.

if (force_keep_alive_syncrep ||
MyWalSnd->flush < sentPtr &&
MyWalSnd->write < sentPtr &&
!waiting_for_ping_response)
WalSndKeepalive(false);

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-08-16 06:59:36 RE: Skipping logical replication transactions on subscriber side
Previous Message Wu Haotian 2021-08-16 06:35:06 Re: Add option --drop-cascade for pg_dump/restore