Quick Links

Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Nitin Motiani <nitinmotiani(at)google(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles
Date:	2025-09-09 06:08:55
Message-ID:	CAFiTN-vpKEw-UYZZLVQhhzcGz6LusfHZY_17G4LJGG0iX9P_Dg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Sep 8, 2025 at 3:03 PM Nitin Motiani <nitinmotiani(at)google(dot)com> wrote:
>
> Hi Hackers,
>
> I'd like to propose a patch to allow accepting connections post recovery without waiting for the removal of old xlog files.
>
> Why : We have seen instances where the crash recovery takes very long (tens of minutes to hours) if a large number of accumulated WAL files need to be cleaned up (eg : Cleaning up 2M old WAL files took close to 4 hours).
>
> This WAL accumulation is usually caused by :
>
> 1. Inactive replication slot
> 2. PITR failing to keep up
>
> In the above cases when the resolution (deleting inactive slot/disabling PITR) is followed by a crash (before checkpoint could run), we see the recovery take a very long time. Note that in these cases the actual WAL replay is done relatively quickly and most of the delay is due to RemoveOldXlogFiles().

It makes sense to improve this.

> How : This patch solves this issue by running RemoveOldXlogFiles() separately and async. This is achieved by doing two things :
>
> 1. Skip RemoveOldXlogFiles() for an END_OF_RECOVERY checkpoint. This will ensure that the recovery finishes sooner and postgres can start accepting connections.
> 2. After the recovery we run another checkpoint without CHECKPOINT_WAIT. This is done in StartupXLOG(). This will lead to some extra work but that should be minuscule as it is run right after the recovery. And the majority of work done by this checkpoint will be in RemoveOldXlogFiles() which can now run asynchronously.
>
> I considered a couple of alternative solutions before attempting this.
>
> 1. One option could be to simply skip the removal of old xlog files during recovery and let a later checkpoint take care of that. But in case of large checkpoint_timeout, this could lead to bloat for longer.
>
> 2. Another approach might be to separate out RemoveOldXlogFiles() in a new request. This might also be doable by creating a special checkpoint flag like CHECKPOINT_ONLY_DELETE_OLD_FILES and using that in RequestCheckpoint(). This way we can have the second checkpoint only take care of file deletion. I ended up picking my approach over this because that can be done with a smaller change which might make it safer and less error-prone.

One of the advantages of this approach over forcing an extra
checkpoint is that you don't need to loop through the entire buffer
pool just to find out mostly nothing is dirty, but yeah this may
create some extra flags and extra checks in checkpointer code.

--
Regards,
Dilip Kumar
Google

In response to

[PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles at 2025-09-08 09:33:00 from Nitin Motiani

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zhijie Hou (Fujitsu)	2025-09-09 06:17:28	RE: Conflict detection for update_deleted in logical replication
Previous Message	Michael Paquier	2025-09-09 05:49:12	Re: Memory leak of SMgrRelation object on standby