Re: Exit walsender before confirming remote flush in logical replication

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrey Silitskiy <a(dot)silitskiy(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ronan Dunklau <ronan(at)dunklau(dot)fr>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "peter(dot)eisentraut(at)enterprisedb(dot)com" <peter(dot)eisentraut(at)enterprisedb(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Exit walsender before confirming remote flush in logical replication
Date: 2026-06-01 20:00:00
Message-ID: 74ff6252-3450-4508-9d16-1cdc891b3cb9@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Kuroda-san,

01.06.2026 06:57, Hayato Kuroda (Fujitsu) wrote:
>> 038_walsnd_shutdown_timeout_subscriber.log doesn't really contain the
>> expected warning:
>> 2026-05-29 21:11:58.777 CEST [2273817][logical replication apply worker][124/2:0] LOG:  logical replication apply worker for subscription "test_sub" has started
>> 2026-05-29 21:12:03.232 CEST [2271870][client backend][4/6:0] LOG:  statement: BEGIN;
>> 2026-05-29 21:12:03.232 CEST [2271870][client backend][4/6:0] LOG:  statement: LOCK TABLE test_tab IN EXCLUSIVE MODE;
> ...
>
> To confirm; IIUC the warning should be contained on the publisher log, not the
> subscriber side. And below log appeared on the publisher;
>
> ```
> 2026-05-29 21:12:03.426 CEST [2275591][walsender][26/1:0] FATAL: canceling authentication due to timeout
> 2026-05-29 21:12:03.432 CEST [2273580][checkpointer][:0] LOG: shutting down
> ```

I'm sorry. I switched to the wrong log file during my investigation.

>
> Is there a possibility that walsender was shut down during the authentication,
> especially in-between BackendInitialize() and end of PerformAuthentication()?
>
>> I think this can be explained by the fact that walrcv->ready_to_display
>> is set before WalReceiverMain's loop reached. I've reproduced this test
>> failure with:
> Verified it could reproduce the failure, but there were no "canceling authentication
> due to timeout" in the publisher log on my env.

I think the original failure can be reproduced with:
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -645,6 +645,7 @@ ClientAuthentication(Port *port)
 #endif
     }

+if (am_walsender) pg_usleep(500 * 1000);
     if ((log_connections & LOG_CONNECTION_AUTHENTICATION) &&
         status == STATUS_OK &&
         !MyClientConnectionInfo.authn_id)

With this modification, I can see:
#   Failed test 'walsender exits due to wal_sender_shutdown_timeout even when both physical and logical replication are
stalled'
#   at t/038_walsnd_shutdown_timeout.pl line 189.
# Looks like you failed 1 test of 4.
t/038_walsnd_shutdown_timeout.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/4 subtests

038_walsnd_shutdown_timeout_publisher.log contains:
2026-06-01 22:32:35.166 EEST [1851879][client backend][0/0:0] FATAL:  terminating connection due to administrator command
2026-06-01 22:32:35.166 EEST [1851879][client backend][0/0:0] DETAIL:  Signal sent by PID 1851819, UID 1000.
2026-06-01 22:32:35.166 EEST [1851878][walsender][26/1:0] FATAL: canceling authentication due to timeout

Best regards,
Alexander

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Si, Evan 2026-06-01 20:05:01 [PATCH] Clarify that ssl_groups is for any key exchange groups
Previous Message Tom Lane 2026-06-01 19:05:32 Re: sandboxing untrusted code