| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Andrey Silitskiy <a(dot)silitskiy(at)postgrespro(dot)ru> |
| Cc: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "peter(dot)eisentraut(at)enterprisedb(dot)com" <peter(dot)eisentraut(at)enterprisedb(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Subject: | Re: Exit walsender before confirming remote flush in logical replication |
| Date: | 2026-01-26 14:08:39 |
| Message-ID: | CAHGQGwGotoS0VeMDdK6ezkhvdQpWZ5oJvO3QKJKEV6Pc+rZ_9A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Jan 22, 2026 at 12:11 AM Andrey Silitskiy
<a(dot)silitskiy(at)postgrespro(dot)ru> wrote:
>
> Dear Vitaliy,
> Thanks for reproducing the error.
>
> The problem occurred during a test case with a full output buffer after
> adding ereport about the termination of walsender process in immediate mode.
> ereport sends message by default if replica is interested in this logging
> level(WARNING level fit this description). And since the output buffer was
> already full, the message could not be sent and the process could not exit.
>
> Changed the logging level to LOG_SERVER_ONLY to exclude sending a message
> to the replica during immediate shutdown.
You changed the log level used by ereport() in WalSndDoneImmediate() to
LOG_SERVER_ONLY, but proc_exit() can still emit messages at other levels.
That could result in attempts to send additional messages to the standby.
To avoid this, should whereToSendOutput be reset, as is done
in WalSndShutdown()?
> In my environment, this error was not reproduced due to the fact that
> buffers did not have time to fill up because pg_ctl stop was called
> immediately after data was inserted and the buffers did not have time to
> fill up and ereport did not hang.
>
> I added a wait for sent_lsn from pg_stat_replication to stop advancing
> in the test before server shutdown to give walsender time to fill output
> buffers.
Thanks for addressing the issue and updating the patch!
There seems also an alternative way to implement immediate shutdown of
walsenders: if wal_sender_shutdown_mode is set to immediate, the postmaster
could send SIGTERM to walsenders, just as it does to backends. In that case,
walsenders would exit without waiting for WAL replication, and the shutdown
sequence could proceed normally. With this approach, WalSndDoneImmediate()
would no longer be needed. I'm still unsure which approach is better....
One issue with the SIGTERM-based approach is that walsenders call
ereport(FATAL) on exit and may try to send the error message to the standby.
If the send buffer is full, the walsender could block in ereport(). This would
need to be addressed if we go that route.
BTW, this issue can also occur when terminating a walsender via
pg_terminate_backend() (for maintenance, etc.). If that is considered
problematic on its own, it might be better handled as a separate patch first.
Regarding the GUC name, wal_sender_shutdown seems simple and sufficient to me.
This isn't a blocker, so I'm fine with the current name for now and
revisiting it later if needed.
On the documentation, there's at least one reference to shutdown behavior
in high-availability.sgml. Since this patch changes walsender shutdown behavior,
that section should be updated as well.
+ In <literal>immediate</literal> mode, the walsender will exit
without waiting
+ for data replication to the receiver. This may break data
consistency between
+ sender and receiver after shutdown, which can be especially
important in
+ case of physical replication and switch-over.
It seems clearer to clarify the benefit of wait_flush in relation to
switchovers, rather than focusing only risks. For example:
----------------------
Specifies how a walsender process terminates after receiving a shutdown
request. Valid values are wait_flush and immediate. The default is wait_flush.
This setting can be configured per walsender.
In wait_flush mode, the walsender waits until all WAL data has been flushed on
the receiver before exiting. This helps keep the sender and receiver in sync
after shutdown, which is especially important for physical replication
switchovers. However, it can delay server shutdown.
In immediate mode, the walsender exits without waiting for WAL data to be
replicated to the receiver. This can reduce shutdown time when flushing
WAL data to the receiver would take a long time, for example on high-latency
networks or when the subscriber's apply worker is blocked waiting for locks
in logical replication.
----------------------
There's no strict rule for the ordering of parameter descriptions, but it seems
more intuitive to place wal_sender_shutdown_mode immediately after
wal_sender_timeout, rather than after track_commit_timestamp,
since the former two are more closely related. If we do that,
postgresql.conf.sample should be updated accordingly.
+#wal_sender_shutdown_mode = wait_flush # walsender termination mode after
+ # receival of shutdown request
In postgresql.conf.sample, enum GUCs typically list their valid values in
the comment. It would be good to do the same for wal_sender_shutdown_mode.
At the top of walsender.c, there are comments describing walsender behavior,
including shutdown handling. Since this patch changes shutdown behavior,
those comments should be updated.
+typedef enum
+{
+ WALSND_SHUTDOWN_MODE_WAIT_FLUSH = 0,
+ WALSND_SHUTDOWN_MODE_IMMEDIATE
+} WalSndShutdownMode;
WalSndShutdownMode should be added into src/tools/pgindent/typedefs.list
for pgindent.
+ * NB: This should only be called when immediate shutdown of walsender
+ * was requested and shutdown signal has been received from postmaster.
The comment on WalSndDoneImmediate() says it is only called after a shutdown
signal from the postmaster, but it can also be called when the checkpointer
signals walsenders to move to the stopping state (got_STOPPING = true),
which may happen earlier. So this comment seems to need to be updated.
Or, if this comment is true, WalSndDoneImmediate() should not be called under
got_STOPPING=true.
+ proc_exit(0);
+ abort();
+}
abort() call at the end of WalSndDoneImmediate() does not seem necessary.
Please see the discussion [1].
Regards,
[1] https://postgr.es/m/CAHGQGwHPX1yoixq+YB5rF4zL90TMmSEa3FpHURtqW3Jc5+=oSA@mail.gmail.com
Regards,
--
Fujii Masao
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bertrand Drouvot | 2026-01-26 14:41:20 | Re: Safer hash table initialization macro |
| Previous Message | Alexander Korotkov | 2026-01-26 13:33:50 | Re: Use correct collation in pg_trgm |