RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2024-03-03 07:56:32
Message-ID: OS0PR01MB5716CD1AB0090155B2ED5C3F945C2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Saturday, March 2, 2024 6:55 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, Mar 2, 2024 at 9:21 AM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> wrote:
> >
> > Apart from the comments, the code in WalSndWaitForWal was refactored a
> > bit to make it neater. Thanks Shveta for helping writing the code and doc.
> >
>
> A few more comments:

Thanks for the comments.

> ==================
> 1.
> +# Wait until the primary server logs a warning indicating that it is
> +waiting # for the sb1_slot to catch up.
> +$primary->wait_for_log(
> + qr/replication slot \"sb1_slot\" specified in parameter
> standby_slot_names does not have active_pid/,
> + $offset);
>
> Shouldn't we wait for such a LOG even in the first test as well which involves two
> standbys and two logical subscribers?

Yes, we should. Added.

>
> 2.
> +##################################################
> +# Test that logical replication will wait for the user-created inactive
> +# physical slot to catch up until we remove the slot from standby_slot_names.
> +##################################################
>
>
> I don't see anything different tested in this test from what we already tested in
> the first test involving two standbys and two logical subscribers. Can you
> please clarify if I am missing something?

I think the intention was to test that the wait loop is ended due to GUC config
reload, while the first test is for the case when the loop is ended due to
restart_lsn movement. But it seems we tested the config reload with xx_get_changes() as
well, so I can remove it if you agree.

>
> 3.
> Note that after receiving the shutdown signal, an ERROR
> + * is reported if any slots are dropped, invalidated, or inactive. This
> + * measure is taken to prevent the walsender from waiting indefinitely.
> + */
> + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event))
>
> Isn't this part of the comment should be moved inside
> NeedToWaitForStandby()?

Moved.

>
> 4.
> + /*
> + * Update our idea of the currently flushed position only if we are
> + * not waiting for standbys to catch up, otherwise the standby would
> + * have to catch up to a newer WAL location in each cycle.
> + */
> + if (wait_event != WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION)
> + {
>
> This functionality (in function WalSndWaitForWal()) seems to ensure that we
> first wait for the required WAL to be flushed and then wait for standbys. If true,
> we should cover that point in the comments here or somewhere in the function
> WalSndWaitForWal().
>
> Apart from this, I have made a few modifications in the comments.

Thanks. I have reviewed and merged them.

Here is the V104 patch which addressed above and Peter's comments.

Best Regards,
Hou zj

Attachment Content-Type Size
v104-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch application/octet-stream 47.0 KB
v104-0002-Document-the-steps-to-check-if-the-standby-is-r.patch application/octet-stream 7.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2024-03-03 09:26:19 Re: POC, WIP: OR-clause support for indexes
Previous Message Zhijie Hou (Fujitsu) 2024-03-03 07:51:38 RE: Synchronizing slots from primary to standby