pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>
Subject: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
Date: 2022-04-18 04:29:12
Message-ID: CALj2ACV9KTXDO=Vjf4K6v-b58x3XcyWDDzN5su2EK0gVWEPAug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 29, 2021 at 8:06 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, Dec 29, 2021 at 7:27 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
> > > > pg_archivecleanup currently takes a WAL file name as input to delete
> > > > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
> > > > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
> > > > automatically detect the last checkpoint (from control file) LSN,
> > > > calculate the lowest restart_lsn required by the replication slots, if
> > > > any (by reading the replication slot info from pg_logical directory),
> > > > archive the unneeded (an archive_command similar to that of the one
> > > > provided in the server config can be provided as an input) WAL files
> > > > before finally deleting them? Making pg_archivecleanup tool as an
> > > > end-to-end solution will help greatly in disk full situations because
> > > > of WAL files growth (inactive replication slots, archive command
> > > > failures, infrequent checkpoint etc.).
> >
> > The overall idea of having a tool for this isn't a bad idea, but ..
> >
> > > pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
> > > suggesting to use it for removing files from pg_wal directory too? No, thanks.
> >
> > We definitely shouldn't have it be part of pg_archivecleanup for the
> > simple reason that it'll be really confusing and almost certainly will
> > be mis-used.
>
> +1
>
> > > WAL files are a key component for backup and replication. Hence, you cannot
> > > deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
> > > wouldn't occur if you have a monitoring system and alerts and someone to keep
> > > an eye on it. If the disk full situation was caused by a failed archive command
> > > or a disconnected standby, it is easy to figure out; the fix is simple.
> >
> > This is perhaps a bit far- PG does, in fact, remove WAL files from
> > PGDATA. Having a tool which will do this safely when the server isn't
> > able to be brought online due to lack of disk space would certainly be
> > helpful rather frequently. I agree that monitoring and alerting are
> > things that everyone should implement and pay attention to, but that
> > doesn't happen and instead people end up just blowing away pg_wal and
> > corrupting their database when, had a tool existed, they could have
> > avoided that happening and brought the system back online in relatively
> > short order without any data loss.
>
> Thanks. Yes, the end-to-end tool is helpful in rather eventual
> situations and having it in the core is more helpful instead of every
> postgres vendor developing their own solution and many times it's hard
> to get it right. Also, I agree to not club this idea with
> pg_archviecleanup. How about having a new tool like
> pg_walcleanup/pg_xlogcleanup helping the developers/admins/users in
> eventual situations?

Thanks for the comments. Here's a new tool called pg_walcleaner which
basically deletes (optionally archiving before deletion) the unneeded
WAL files.

Please provide your thoughts and review the patches.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v1-0001-pg_walcleaner.patch application/octet-stream 27.9 KB
v1-0002-pg_walcleaner-docs.patch application/octet-stream 6.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-04-18 04:32:37 Re: Skipping logical replication transactions on subscriber side
Previous Message Noah Misch 2022-04-18 04:23:59 Re: Timeout control within tests