Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
Subject: Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary
Date: 2021-12-29 13:57:10
Message-ID: 20211229135710.GH15820@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Euler Taveira (euler(at)eulerto(dot)com) wrote:
> On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
> > pg_archivecleanup currently takes a WAL file name as input to delete
> > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
> > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
> > automatically detect the last checkpoint (from control file) LSN,
> > calculate the lowest restart_lsn required by the replication slots, if
> > any (by reading the replication slot info from pg_logical directory),
> > archive the unneeded (an archive_command similar to that of the one
> > provided in the server config can be provided as an input) WAL files
> > before finally deleting them? Making pg_archivecleanup tool as an
> > end-to-end solution will help greatly in disk full situations because
> > of WAL files growth (inactive replication slots, archive command
> > failures, infrequent checkpoint etc.).

The overall idea of having a tool for this isn't a bad idea, but ..

> pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
> suggesting to use it for removing files from pg_wal directory too? No, thanks.

We definitely shouldn't have it be part of pg_archivecleanup for the
simple reason that it'll be really confusing and almost certainly will
be mis-used. For my 2c, we should just remove pg_archivecleanup
entirely.

> WAL files are a key component for backup and replication. Hence, you cannot
> deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
> wouldn't occur if you have a monitoring system and alerts and someone to keep
> an eye on it. If the disk full situation was caused by a failed archive command
> or a disconnected standby, it is easy to figure out; the fix is simple.

This is perhaps a bit far- PG does, in fact, remove WAL files from
PGDATA. Having a tool which will do this safely when the server isn't
able to be brought online due to lack of disk space would certainly be
helpful rather frequently. I agree that monitoring and alerting are
things that everyone should implement and pay attention to, but that
doesn't happen and instead people end up just blowing away pg_wal and
corrupting their database when, had a tool existed, they could have
avoided that happening and brought the system back online in relatively
short order without any data loss.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nitin Jadhav 2021-12-29 13:58:19 Re: Multi-Column List Partitioning
Previous Message Stephen Frost 2021-12-29 13:46:40 Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes