Quick Links

Re: Sending unflushed WAL in physical replication

From:	Rahila Syed <rahilasyed90(at)gmail(dot)com>
To:	SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Re: Sending unflushed WAL in physical replication
Date:	2025-09-30 04:41:18
Message-ID:	CAH2L28tCKaT1177RHB+GVOSimxr+v-U=E7R4CDW-t---X-OkgA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

> At the high level idea LGTM.
>
>

Thank you for looking into it.

>> Observations from the benchmark:
>> 1. The patch improves TPS by ~13% in the sync replication setup. In
>> repeated runs,
>> I see that the TPS increase is anywhere between 5% to 13% .
>> 2. WAL sender reads significantly less WAL from disk, indicating more
>> efficient use
>> of WAL buffers and reduced disk I/O
>>
>
> Can you please measure the transaction commit latency improvement as well.
> Commit latency = Primary_Disk_Flush_time + Standby_disk_fluish_time +
> network_roundtrip_time
>
>

The pgbench average latency should capture this, since it measures the time
from
the start to the end of a transaction. In synchronous replication, each
transaction waits
for write confirmation from the standby before commiting, and that
additional wait time is
included in the latency measurement. I will post that with the next
benchmark results.

What happens in crash recovery scenarios? For example, when a standby crash
> restart,
> it replays until the end of WAL. In this case, it may end up replaying WAL
> that was
> never flushed on the primary (if primary does a crash recovery).
> Shouldn't archive on standby not upload WAL before WAL gets flushed on the
> primary?
> Same applicable for pg_receivewal.
>

The current solution isn’t sufficient for situations where we rely solely
on the WAL files to identify
what needs to be replayed. In these cases, we need to either write the
unflushed WAL data to a buffer and
then to temporary files until the primary flush occurs or store the flush
pointer so that the recovery process
knows up to which point it should replay the WAL.

As mentioned in the TODO section of my previous email, I am currently
working on a more robust method to
manage unflushed WAL on the receiver. The goal is to ensure this does not
disrupt recovery or affect tools that
expect the WAL files on standby to only contain WAL records that have
already been flushed on the primary.

Thank you,
Rahila Syed

In response to

Re: Sending unflushed WAL in physical replication at 2025-09-27 08:23:47 from SATYANARAYANA NARLAPURAM

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2025-09-30 04:46:11	Re: Suggestion to add --continue-client-on-abort option to pgbench
Previous Message	Masahiko Sawada	2025-09-30 03:54:58	Re: Support getrandom() for pg_strong_random() source