Quick Links

Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Nitin Motiani <nitinmotiani(at)google(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles
Date:	2025-09-09 06:58:25
Message-ID:	CAA4eK1LANwLdEhavTfTtmOD8LJ8uUoMY7FtPX_3YF7ge=Z7TcA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Sep 8, 2025 at 3:03 PM Nitin Motiani <nitinmotiani(at)google(dot)com> wrote:
>
> I'd like to propose a patch to allow accepting connections post recovery without waiting for the removal of old xlog files.
>
> Why : We have seen instances where the crash recovery takes very long (tens of minutes to hours) if a large number of accumulated WAL files need to be cleaned up (eg : Cleaning up 2M old WAL files took close to 4 hours).
>
> This WAL accumulation is usually caused by :
>
> 1. Inactive replication slot
> 2. PITR failing to keep up
>
> In the above cases when the resolution (deleting inactive slot/disabling PITR) is followed by a crash (before checkpoint could run), we see the recovery take a very long time. Note that in these cases the actual WAL replay is done relatively quickly and most of the delay is due to RemoveOldXlogFiles().
>

Isn't it better to fix the reasons for WAL accumulation? Because even
without recovery, this can fill up the disk. For example, one can use
idle_replication_slot_timeout for inactive slots. Similarly, we can
see what leads to slow PITR and try to avoid that.

--
With Regards,
Amit Kapila.

In response to

[PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles at 2025-09-08 09:33:00 from Nitin Motiani

Responses

Re: [PATCH] Accept connections post recovery without waiting for RemoveOldXlogFiles at 2025-09-09 07:12:00 from Dilip Kumar

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrei Lepikhov	2025-09-09 07:02:01	Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
Previous Message	Dilip Kumar	2025-09-09 06:37:19	Re: Adding pg_dump flag for parallel export to pipes