Re: walsender bug: stuck during shutdown

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Chloe Dives <Chloe(dot)Dives(at)cantabcapital(dot)com>, Chris Wilson <chris(dot)wilson(at)cantabcapital(dot)com>
Subject: Re: walsender bug: stuck during shutdown
Date: 2020-11-26 07:53:04
Message-ID: abd3220d-bf25-6118-7060-5e9cf7cdfc74@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/11/26 11:45, Alvaro Herrera wrote:
> On 2020-Nov-26, Fujii Masao wrote:
>
>> On the second thought, walsender doesn't wait forever unless
>> wal_sender_timeout is disabled, even in the case in discussion?
>> Or if there is the case where wal_sender_timeout doesn't work expectedly,
>> we might need to fix that at first.
>
> Hmm, no, it doesn't wait forever in that sense; tracing with the
> debugger shows that the process is looping continuously.

Yes, so the problem here is that walsender goes into the busy loop
in that case. Seems this happens only in logical replication walsender.
In physical replication walsender, WaitLatchOrSocket() in WalSndLoop()
seems to work as expected and prevent it from entering into busy loop
even in that case.

/*
* If postmaster asked us to stop, don't wait anymore.
*
* It's important to do this check after the recomputation of
* RecentFlushPtr, so we can send all remaining data before shutting
* down.
*/
if (got_STOPPING)
break;

The above code in WalSndWaitForWal() seems to cause this issue. But I've
not come up with idea about how to fix yet.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2020-11-26 07:59:21 RE: Stronger safeguard for archive recovery not to miss data
Previous Message osumi.takamichi@fujitsu.com 2020-11-26 07:51:32 RE: Stronger safeguard for archive recovery not to miss data