Re: Minimal logical decoding on standbys

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2023-03-01 10:51:25
Message-ID: 061eeb26-156e-6011-02b6-b4b1422e2aa6@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 3/1/23 1:48 AM, Jeff Davis wrote:
> On Mon, 2023-02-27 at 09:40 +0100, Drouvot, Bertrand wrote:
>> Please find attached V51 tiny rebase due to a6cd1fc692 (for 0001) and
>> 8a8661828a (for 0005).
>
> [ Jumping into this thread late, so I apologize if these comments have
> already been covered. ]
>

Thanks for looking at it!

> Regarding v51-0004:
>
> * Why is the CV sleep not being canceled?

I think that's an oversight, I'll look at it.

> * Comments on WalSndWaitForWal need to be updated to explain the
> difference between the flush (primary) and the replay (standby) cases.
>

Yeah, will do.

> Overall, it seems like what you really want for the sleep/wakeup logic
> in WalSndWaitForLSN

Typo for WalSndWaitForWal()?

> is something like this:
>
> condVar = RecoveryInProgress() ? replayCV : flushCV;
> waitEvent = RecoveryInProgress() ?
> WAIT_EVENT_WAL_SENDER_WAIT_REPLAY :
> WAIT_EVENT_WAL_SENDER_WAIT_FLUSH;
>
> ConditionVariablePrepareToSleep(condVar);
> for(;;)
> {
> ...
> sleeptime = WalSndComputeSleepTime(GetCurrentTimestamp());
> socketEvents = WL_SOCKET_READABLE;
> if (pq_is_send_pending())
> socketEvents = WL_SOCKET_WRITABLE;
> ConditionVariableTimedSleepOrEvents(
> condVar, sleeptime, socketEvents, waitEvent);
> }
> ConditionVariableCancelSleep();
>
>
> But the problem is that ConditionVariableTimedSleepOrEvents() doesn't
> exist, and I think that's what Andres was suggesting here[1].
> WalSndWait() only waits for a timeout or a socket event, but not a CV;
> ConditionVariableTimedSleep() only waits for a timeout or a CV, but not
> a socket event.
>
> I'm also missing how WalSndWait() works currently. It calls
> ModifyWaitEvent() with NULL for the latch, so how does WalSndWakeup()
> wake it up?

I think it works because the latch is already assigned to the FeBeWaitSet
in pq_init()->AddWaitEventToSet() (for latch_pos).

>
> Assuming I'm wrong, and WalSndWait() does use the latch, then I guess
> it could be extended by having two different latches in the WalSnd
> structure, and waking them up separately and waiting on the right one.

I'm not sure this is needed in this particular case, because:

Why not "simply" call ConditionVariablePrepareToSleep() without any call to ConditionVariableTimedSleep() later?

In that case the walsender will be put in the wait queue (thanks to ConditionVariablePrepareToSleep())
and will be waked up by the event on the socket, the timeout or the CV broadcast (since IIUC they all rely on the same latch).

So, something like:

condVar = RecoveryInProgress() ? replayCV : flushCV;
ConditionVariablePrepareToSleep(condVar);
for(;;)
{
...
sleeptime = WalSndComputeSleepTime(GetCurrentTimestamp());
socketEvents = WL_SOCKET_READABLE;
if (pq_is_send_pending())
socketEvents = WL_SOCKET_WRITABLE;
WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_WAIT_WAL); <-- Note: the code within the loop does not change at all

}
ConditionVariableCancelSleep();

If the walsender is waked up by the CV broadcast, then it means the flush/replay occurred and then we should exit the loop
right after due to:

"
/* check whether we're done */
if (loc <= RecentFlushPtr)
break;

"

meaning that in this particular case there is only one wake up due to the CV broadcast before exiting the loop.

That looks weird to use ConditionVariablePrepareToSleep() without actually using ConditionVariableTimedSleep()
but it looks to me that it would achieve the same goal: having the walsender being waked up
by the event on the socket, the timeout or the CV broadcast.

In that case we would be missing the WAIT_EVENT_WAL_SENDER_WAIT_REPLAY and/or the WAIT_EVENT_WAL_SENDER_WAIT_FLUSH
wait events thought (and we'd just provide the WAIT_EVENT_WAL_SENDER_WAIT_WAL one) but I'm not sure that's a big issue.

What do you think?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Luzanov 2023-03-01 10:55:13 Re: psql: Add role's membership options to the \du+ command
Previous Message Daniel Verite 2023-03-01 10:41:13 Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs