Re: Exit walsender before confirming remote flush in logical replication

From: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Andrey Silitskiy <a(dot)silitskiy(at)postgrespro(dot)ru>
Cc: Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ronan Dunklau <ronan(at)dunklau(dot)fr>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "peter(dot)eisentraut(at)enterprisedb(dot)com" <peter(dot)eisentraut(at)enterprisedb(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: Exit walsender before confirming remote flush in logical replication
Date: 2026-03-25 15:16:13
Message-ID: d31063db-ba90-4ce6-b6a4-cb9d92da7096@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Fujii-san,

Thank you for the testing.

On 3/25/26 15:39, Fujii Masao wrote:
> I tested wal_sender_shutdown_timeout under several configurations and
> encountered a case where the primary shutdown got stuck, even with the patch
> and wal_sender_shutdown_timeout = 1. I'm not sure yet whether this is a bug in
> the patch or an issue with my test setup, but anyway I'd like to share
> the reproduction steps for reference.

It seems that the problem lies in the logic of calculating sleep time in
WalSndComputeSleeptime function. If the parameter wal_sender_timeout is set to
one hour and the function WalSndWait executes with an argument sleeptime = 1h,
then the variable shutdown_request_timestamp will only be updated after one
hour at next call of WalSndCheckShutdownTimeout immediately following the
waiting period completion.

May be to use the minimal timeout in WalSndComputeSleeptimes or to use
the timeouts mechanism (timeout.c), but WalSndWait should wake up on latch then.

With best regards,
Vitaly

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2026-03-25 15:27:37 Re: pg_stat_replication.*_lag sometimes shows NULL during active replication
Previous Message Andres Freund 2026-03-25 15:15:18 Re: Don't synchronously wait for already-in-progress IO in read stream