Re: Force the old transactions logs cleanup even if checkpoint is skipped

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Zakhlystov, Daniil (Nebius)" <usernamedt(at)nebius(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, "amborodin(at)acm(dot)org" <amborodin(at)acm(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Mokrushin, Mikhail (Nebius)" <rodrijjke(at)nebius(dot)com>
Subject: Re: Force the old transactions logs cleanup even if checkpoint is skipped
Date: 2023-11-10 23:58:43
Message-ID: 20231110235843.avvkgiownmzr4gf7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-11-09 11:50:10 +0000, Zakhlystov, Daniil (Nebius) wrote:
> > On 9 Nov 2023, at 01:30, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > I am not really convinced that this is worth complicating the skipped
> > path for this goal. In my experience, I've seen complaints where WAL
> > archiving bloat was coming from the archive command not able to keep
> > up with the amount generated by the backend, particularly because the
> > command invocation was taking longer than it takes to generate a new
> > segment. Even if there is a hole of activity in the server, if too
> > much WAL has been generated it may not be enough to catch up depending
> > on the number of segments that need to be processed. Others are free
> > to chime in with extra opinions, of course.
>
> I agree that there might multiple reasons of pg_wal bloat. Please note that
> I am not addressing the WAL archiving issue at all. My proposal is to add a
> small improvement to the WAL cleanup routine for WALs that have been already
> archived successfully to free the disk space.
>
> Yes, it might be not a common case, but a fairly realistic one. It occurred multiple times
> in our production when we had temporary issues with archiving. This small
> complication of the skipped path will help Postgres to return to a normal operational
> state without any human operator / external control routine intervention.

I agree that the scenario is worth addressing - it's quite a nasty situation.

But I'm not sure this is the way to address it. If a checkpoint does have to
happen, we might not get to the point of removing the old segments, because we
might fail to emit the WAL record due to running out of space. And if that
doesn't happen - do we really want to wait till a checkpoint finishes to free
up space?

What if we instead made archiver delete WAL files after archiving, if they're
old enough? Some care would be needed to avoid checkpointer and archiver
trampling on each other, but that doesn't seem too hard.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2023-11-11 00:00:00 Re: maybe a type_sanity. sql bug
Previous Message Andres Freund 2023-11-10 23:34:25 Re: ResourceOwner refactoring