Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()
Date: 2023-12-12 19:54:32
Message-ID: 3abb8c54-11de-4220-ad65-c2785df01124@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 12, 2023, at 12:58 PM, Drouvot, Bertrand wrote:
> Currently walrcv->walRcvState is set to WALRCV_STREAMING at the
> beginning of WalReceiverMain().
>
> But it seems that after this assignment things could be wrong before the
> walreicever actually starts streaming (like not being able to connect
> to the primary).
>
> It looks to me that WALRCV_STREAMING should be set once walrcv_startstreaming()
> returns true: this is the proposal of this patch.

Per the state name (streaming), it seems it should be set later as you
proposed, however, I'm concerned about the previous state (WALRCV_STARTING). If
I'm reading the code correctly, WALRCV_STARTING is assigned at
RequestXLogStreaming():

SetInstallXLogFileSegmentActive();
RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
PrimarySlotName,
wal_receiver_create_temp_slot);
flushedUpto = 0;
}

/*
* Check if WAL receiver is active or wait to start up.
*/
if (!WalRcvStreaming())
{
lastSourceFailed = true;
break;
}

After a few lines the function WalRcvStreaming() has:

SpinLockRelease(&walrcv->mutex);

/*
* If it has taken too long for walreceiver to start up, give up. Setting
* the state to STOPPED ensures that if walreceiver later does start up
* after all, it will see that it's not supposed to be running and die
* without doing anything.
*/
if (state == WALRCV_STARTING)
{
pg_time_t now = (pg_time_t) time(NULL);

if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
{
bool stopped = false;

SpinLockAcquire(&walrcv->mutex);
if (walrcv->walRcvState == WALRCV_STARTING)
{
state = walrcv->walRcvState = WALRCV_STOPPED;
stopped = true;
}
SpinLockRelease(&walrcv->mutex);

if (stopped)
ConditionVariableBroadcast(&walrcv->walRcvStoppedCV);
}
}

Couldn't it give up before starting if you apply your patch? My main concern is
due to a slow system, the walrcv_connect() took to long in WalReceiverMain()
and the code above kills the walreceiver while in the process to start it.
Since you cannot control the hardcoded WALRCV_STARTUP_TIMEOUT value, you might
have issues during overload periods.

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-12-12 20:11:31 Re: Report planning memory in EXPLAIN ANALYZE
Previous Message Michael Paquier 2023-12-12 16:06:51 pgsql: Prevent tuples to be marked as dead in subtransactions on standb