Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()
Date: 2023-12-18 10:18:10
Message-ID: c76c0a65-f754-4614-b616-1d48f9195745@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 12/12/23 8:54 PM, Euler Taveira wrote:
> On Tue, Dec 12, 2023, at 12:58 PM, Drouvot, Bertrand wrote:
>> Currently walrcv->walRcvState is set to WALRCV_STREAMING at the
>> beginning of WalReceiverMain().
>>
>> But it seems that after this assignment things could be wrong before the
>> walreicever actually starts streaming (like not being able to connect
>> to the primary).
>>
>> It looks to me that WALRCV_STREAMING should be set once walrcv_startstreaming()
>> returns true: this is the proposal of this patch.
>
> Per the state name (streaming), it seems it should be set later as you
> proposed,

Thanks for looking at it!

> however, I'm concerned about the previous state (WALRCV_STARTING). If
> I'm reading the code correctly, WALRCV_STARTING is assigned at
> RequestXLogStreaming():
>
>                         SetInstallXLogFileSegmentActive();
>                         RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
>                                              PrimarySlotName,
>                                              wal_receiver_create_temp_slot);
>                         flushedUpto = 0;
>                     }
>
>                     /*
>                      * Check if WAL receiver is active or wait to start up.
>                      */
>                     if (!WalRcvStreaming())
>                     {
>                         lastSourceFailed = true;
>                         break;
>                     }
>
> After a few lines the function WalRcvStreaming() has:
>
>     SpinLockRelease(&walrcv->mutex);
>
>     /*
>      * If it has taken too long for walreceiver to start up, give up. Setting
>      * the state to STOPPED ensures that if walreceiver later does start up
>      * after all, it will see that it's not supposed to be running and die
>      * without doing anything.
>      */
>     if (state == WALRCV_STARTING)
>     {
>         pg_time_t   now = (pg_time_t) time(NULL);
>
>         if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
>         {
>             bool        stopped = false;
>
>             SpinLockAcquire(&walrcv->mutex);
>             if (walrcv->walRcvState == WALRCV_STARTING)
>             {
>                 state = walrcv->walRcvState = WALRCV_STOPPED;
>                 stopped = true;
>             }
>             SpinLockRelease(&walrcv->mutex);
>
>             if (stopped)
>                 ConditionVariableBroadcast(&walrcv->walRcvStoppedCV);
>         }
>     }
>
> Couldn't it give up before starting if you apply your patch? My main concern is
> due to a slow system, the walrcv_connect() took to long in WalReceiverMain()
> and the code above kills the walreceiver while in the process to start it.

Yeah, so it looks to me that the sequence of events is:

1) The startup process sets walrcv->walRcvState = WALRCV_STARTING (in RequestXLogStreaming())
2) The startup process sets the walrcv->startTime (in RequestXLogStreaming())
3) The startup process asks then the postmaster to starts the walreceiver
4) Then The startup process checks if WalRcvStreaming() is true

Note that 3) is not waiting for the walreceiver to actually start: it "just" sets
a flag and kill (SIGUSR1) the postmaster (in SendPostmasterSignal()).

It means that if the time between 1 and 4 is <= WALRCV_STARTUP_TIMEOUT (10 seconds)
then WalRcvStreaming() returns true (even if the walreceiver is not streaming yet).

So it looks to me that even if the walreceiver does take time to start streaming,
as long as the time between 1 and 4 is <= 10 seconds we are fine.

And I think it's fine because WalRcvStreaming() does not actually "only" check that the
walreceiver is streaming but as its comment states:

"
/*
* Is walreceiver running and streaming (or at least attempting to connect,
* or starting up)?
*/
"

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-12-18 10:36:25 Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()
Previous Message John Naylor 2023-12-18 09:55:18 Re: Simplify newNode()