Re: Synchronizing slots from primary to standby

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-10-25 15:19:54
Message-ID: f3228cfb-7bf3-4bd8-8f37-c55fc4054759@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 10/9/23 12:30 PM, shveta malik wrote:
> PFA v22 patch-set. It has below changes:
>
> patch 001:
> 1) Now physical walsender wakes up logical walsender(s) by using a new
> CV as suggested in [1]

Thanks!

I think that works fine as long as the standby is up and running and catching up.

The problem I see with the current WalSndWaitForStandbyConfirmation() implementation
is that if the standby is not running then:

+ for (;;)
+ {
+ ListCell *l;
+ long sleeptime = -1;

will loop until we reach the "terminating walsender process due to replication timeout" if we
explicitly want to end with SIGINT or friends.

For example a scenario like:

- standby down
- pg_recvlogical running

then CTRL-C on pg_recvlogical would not "respond" immediately but when we reach the replication timeout.

So it seems that we should use something like WalSndWait() instead of ConditionVariableTimedSleep() here:

+ /*
+ * Sleep until other physical walsenders awaken us or until a timeout
+ * occurs.
+ */
+ sleeptime = WalSndComputeSleeptime(GetCurrentTimestamp());
+
+ ConditionVariableTimedSleep(&WalSndCtl->wal_confirm_rcv_cv, sleeptime,
+ WAIT_EVENT_WAL_SENDER_WAIT_FOR_STANDBY_CONFIRMATION);

In that case I think that WalSndWait() should take care of the new CV WalSndCtl->wal_confirm_rcv_cv too.
The wait on the socket should allow us to stop waiting when, for example, CTRL-C on pg_recvlogical is triggered.

Then we would need to deal with this scenario: Standby down or not catching up and exited WalSndWait() due to the socket
to break the loop or shutdown the walsender.

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-10-25 15:24:25 Re: trying again to get incremental backup
Previous Message Daniele Varrazzo 2023-10-25 15:03:18 libpq async connection and multiple hosts