RE: logical replication empty transactions

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: RE: logical replication empty transactions
Date: 2022-03-25 00:30:30
Message-ID: OS0PR01MB571653811244224062F3CA87941A9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, March 24, 2022 11:19 AM houzj(dot)fnst(at)fujitsu(dot)com wrote:
> On Tuesday, March 22, 2022 7:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > On Tue, Mar 22, 2022 at 7:25 AM houzj(dot)fnst(at)fujitsu(dot)com
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > > On Monday, March 21, 2022 6:01 PM Amit Kapila
> > > > <amit(dot)kapila16(at)gmail(dot)com>
> > > > wrote:
> > >
> > > Oh, sorry, I posted the wrong patch, here is the correct one.
> > >
> >
> > The test change looks good to me. I think additionally we can verify
> > that the record is not reflected in the subscriber table. Apart from
> > that, I had made minor changes mostly in the comments in the attached
> > patch. If those look okay to you, please include those in the next version.
>
> Thanks, the changes look good to me, I merged the diff patch.
>
> Attach the new version patch which include the following changes:
>
> - Fix a typo
> - Change the requestreply flag of the newly added WalSndKeepalive to false,
> because the subscriber can judge whether it's necessary to post a reply
> based
> on the received LSN.
> - Add a testcase to make sure there is no data in subscriber side when the
> transaction is skipped.
> - Change the name of flag skipped_empty_xact to skipped_xact which seems
> more
> understandable.
> - Merge Amit's suggested changes.
>

I did some more review for the newly added keepalive message and confirmed that
it's necessary to send this in sync mode.

+ if (skipped_xact &&
+ SyncRepRequested() &&
+ ((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
+ WalSndKeepalive(false, ctx->write_location);

Because in sync replication, the publisher need to get the reply from
subscirber to release the waiter. After applying the patch, we don't send empty
transaction to subscriber, so we won't get a reply without this keepalive
message. Although the walsender usually invoke WalSndWaitForWal() which will
also send a keepalive message to subscriber, and we could get a reply and
release the wait. But WalSndWaitForWal() is not always invoked for each record.
When reading the page, we won't invoke WalSndWaitForWal() if we already have
the record in our buffer[1].

[1] ReadPageInternal(
...
/* check whether we have all the requested data already */
if (targetSegNo == state->seg.ws_segno &&
targetPageOff == state->segoff && reqLen <= state->readLen)
return state->readLen;
...

Based on above, if we don't have the newly added keepalive message in the
patch, the transaction could wait for a bit more time to finish.

For example, I did some experiments to confirm:
1. Set LOG_SNAPSHOT_INTERVAL_MS and checkpoint_timeout to a bigger value to
make sure it doesn't generate extra WAL which could affect the test.
2. Use debugger to attach the walsender and let it stop in the WalSndWaitForWal()
3. Start two clients and modify un-published table
postgres1 # INSERT INTO not_rep VALUES(1);
---- waiting
postgres2 # INSERT INTO not_rep VALUES(1);
---- waiting
4. Release the walsender, and we can see it won't send a keepalive to
subscriber until it has handled all the above two transactions, which means
the two transaction will wait until all of them has been decoded. This
behavior doesn't looks good and is inconsistent with the current
behavior(the transaction will finish after decoding it or after sending it
to sub if necessary).

So, I think the newly add keepalive message makes sense.

Best regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-03-25 00:37:25 Re: Corruption during WAL replay
Previous Message David G. Johnston 2022-03-25 00:25:45 Re: pg_dump new feature: exporting functions only. Bad or good idea ?