Re: Don't keep closed WAL segment in page cache after replay

From: Japin Li <japinli(at)hotmail(dot)com>
To: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc: Hüseyin Demir <huseyin(dot)d3r(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Don't keep closed WAL segment in page cache after replay
Date: 2026-03-04 01:40:30
Message-ID: SY7PR01MB1092120F55B852C32BB22C1FAB67CA@SY7PR01MB10921.ausprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Anthonin

Date: Wed, 04 Mar 2026 09:37:27 +0800

On Tue, 03 Mar 2026 at 15:13, Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:
> On Mon, Mar 2, 2026 at 9:15 AM Hüseyin Demir <huseyin(dot)d3r(at)gmail(dot)com> wrote:
>> The idea looks good and efficient albeit I have some feedback. The first one is about logical replication slot.
>>
>> Inside the patch, it checks if there is an active walsender
>> process. Is it possible to create a replication slot and wait until
>> a subscriber will connect it. During this time due to patch
>> PostgreSQL will close the WAL segments on the memory and once the
>> subscriber connects it has to read the WAL files from disk. But it's
>> a trade-off and can be decided by others too.
>
> Good point. This is also the case for physical replication slots, if
> there's at least one used replication slot, then it's very likely a
> walsender will start at some point and would need to read the WAL. And
> having to read a large amount of WAL files from disk is likely not
> desirable, so it's probably better to add this as a condition.
>
>> Secondly, when it comes to using spinlock to check the running
>> walsender processes it can lead inefficient recovery
>> process. Because assuming that a database with max_wal_sender set to
>> 20+ and producing more than 4-5TB WAL in a day it can cause
>> additional +100-200 spinlocks each second on walreceiver. Put
>> simply, WalSndRunning() scans every walsender slot with spinlocks on
>> every segment switch, contending with all active walsenders updating
>> their own slots. On high-throughput standbys this creates
>> unnecessary cross-process spinlock contention in the recovery hot
>> path — the
>> exact path that should be as lean as possible for fast replay. Maybe you can implement a single pg_atomic_uint32 counter in WalSndCtlData and achieve the same result with zero contention.
>
> True. I think I've assumed that a segment closing is rare enough that
> it would be fine to go through all walsenders, but there are certainly
> clusters that can generate a large amount of segments.
>
> I've updated the patch with the suggested approach:
> - a new atomic counter tracking the number of active walsenders
> - a similar atomic counter for the number of used replication slots.
> Otherwise, we would have a similar issue of going through all
> max_replication_slots to check if one is used.
> - Both are now used as condition to send POSIX_FADV_DONTNEED or not
>
> Regards,
> Anthonin Bonnefoy

I noticed two different comment styles in the v4 patch:

1.
+/*
+ * Return the number of used replication slots
+ */
+int
+ReplicationSlotNumUsed(void)

2.
+/*
+ * Returns the number of active walsender processes
+ */
+int
+WalSndNumActive(void)

Both "Return ..." and "Returns ..." styles exist in the PostgreSQL codebase,
but within the same patch, would it be better to use a consistent style?

I'd like to use the imperative/singular form. What do you think?

>
> [2. text/x-diff; v4-0001-Don-t-keep-closed-WAL-segments-in-page-cache-afte.patch]...

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2026-03-04 01:59:20 Re: Exit walsender before confirming remote flush in logical replication
Previous Message Chao Li 2026-03-04 01:39:07 Re: Use allocation macros in the logical replication code