Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-10-26 08:40:38
Message-ID: CAA4eK1+kCww8VAAmDuUkJVtHCK6-n3Z2JDYMd6Mkwp6mC2ts9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 25, 2023 at 8:49 PM Drouvot, Bertrand
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> On 10/9/23 12:30 PM, shveta malik wrote:
> > PFA v22 patch-set. It has below changes:
> >
> > patch 001:
> > 1) Now physical walsender wakes up logical walsender(s) by using a new
> > CV as suggested in [1]
>
> Thanks!
>
> I think that works fine as long as the standby is up and running and catching up.
>
> The problem I see with the current WalSndWaitForStandbyConfirmation() implementation
> is that if the standby is not running then:
>
> + for (;;)
> + {
> + ListCell *l;
> + long sleeptime = -1;
>
> will loop until we reach the "terminating walsender process due to replication timeout" if we
> explicitly want to end with SIGINT or friends.
>
> For example a scenario like:
>
> - standby down
> - pg_recvlogical running
>
> then CTRL-C on pg_recvlogical would not "respond" immediately but when we reach the replication timeout.
>
> So it seems that we should use something like WalSndWait() instead of ConditionVariableTimedSleep() here:
>
> + /*
> + * Sleep until other physical walsenders awaken us or until a timeout
> + * occurs.
> + */
> + sleeptime = WalSndComputeSleeptime(GetCurrentTimestamp());
> +
> + ConditionVariableTimedSleep(&WalSndCtl->wal_confirm_rcv_cv, sleeptime,
> + WAIT_EVENT_WAL_SENDER_WAIT_FOR_STANDBY_CONFIRMATION);
>
> In that case I think that WalSndWait() should take care of the new CV WalSndCtl->wal_confirm_rcv_cv too.
> The wait on the socket should allow us to stop waiting when, for example, CTRL-C on pg_recvlogical is triggered.
>
> Then we would need to deal with this scenario: Standby down or not catching up and exited WalSndWait() due to the socket
> to break the loop or shutdown the walsender.
>
> Thoughts?
>

Good point, I think we should enhance the WalSndWait() logic to
address this case. Additionally, I think we should ensure that
WalSndWaitForWal() shouldn't wait twice once for wal_flush and a
second time for wal to be replayed by physical standby. It should be
okay to just wait for Wal to be replayed by physical standby when
applicable, otherwise, just wait for Wal to flush as we are doing now.
Does that make sense?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2023-10-26 08:49:02 Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Previous Message Zhijie Hou (Fujitsu) 2023-10-26 08:31:48 RE: Open a streamed block for transactional messages during decoding