| From: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> |
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Andrey Silitskiy <a(dot)silitskiy(at)postgrespro(dot)ru> |
| Cc: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "peter(dot)eisentraut(at)enterprisedb(dot)com" <peter(dot)eisentraut(at)enterprisedb(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Subject: | Re: Exit walsender before confirming remote flush in logical replication |
| Date: | 2026-01-20 17:03:55 |
| Message-ID: | e25567b4-9893-48bf-ac17-0e884f1acef9@postgrespro.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Dear Hackers,
I think, I reproduced test fails. The test fails because walsender is in
waiting state in WalSndDoneImmediate -> ereport with the following stack (see
below). It seems, it tries to send the message to the replica and flush it, but
the replica is hung.
#0 0x00007a4b37f2a037 in epoll_wait
#1 0x000056855317a2e8 in WaitEventSetWaitBlock
#2 WaitEventSetWait
#3 0x0000568552feea8e in secure_write
#4 0x0000568552ff5666 in internal_flush_buffer
#5 0x0000568552ff5966 in internal_flush
#6 socket_flush ()
#7 socket_flush ()
#8 0x00005685532ff1b3 in send_message_to_frontend (edata=<optimized out>)
#9 EmitErrorReport ()
#10 0x00005685532ff6dd in errfinish
#11 0x000056855312cc9c in WalSndDoneImmediate () at walsender.c:3625
I would propose to remove the ereport call from WalSndDoneImmediate.
With best regards,
Vitaly
On 1/19/26 15:41, Fujii Masao wrote:
> On Sun, Jan 18, 2026 at 1:20 AM Andrey Silitskiy
> <a(dot)silitskiy(at)postgrespro(dot)ru> wrote:
>>
>> On Jan 9, 2026 at 10:04 AM Fujii Masao
>> <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Why do we need to send a "done" message to the receiver here?
>>> Since delivery isn't guaranteed in immediate mode, it seems of limited
>>> value.
>>
>> It seems to me that it is better to send a message in cases where it is
>> possible, so as not to raise errors on the subscriber during a clean shutdown.
>> And when this is not possible, exit the process without waiting.
>>
>>> For the immediate mode, would it make sense to log that the walsender is
>>> terminating in immediate mode and that WAL replication may be incomplete,
>>> so users can more easily understand what happened?
>>
>> Added to the latest patch.
>
> Thanks for updating the patch!
>
> cfbot is reporting a test failure. Could you please look into it and
> fix the issue?
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F6234
>
> Regards,
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alvaro Herrera | 2026-01-20 17:12:00 | Re: log_min_messages per backend type |
| Previous Message | Andres Freund | 2026-01-20 17:03:30 | Re: meson: Allow disabling static libraries |