Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>
Subject: Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
Date: 2022-04-18 15:05:49
Message-ID: CALj2ACXW2rYU_FmE_ci3UqnLbpzYTnaVW2Z_5NaOZnKzU1jMVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>
> Greetings,
>
> * Bharath Rupireddy (bharath(dot)rupireddyforpostgres(at)gmail(dot)com) wrote:
> > Thanks for the comments. Here's a new tool called pg_walcleaner which
> > basically deletes (optionally archiving before deletion) the unneeded
> > WAL files.
> >
> > Please provide your thoughts and review the patches.
>
> Alright, I spent some more time thinking about this and contemplating
> what the next steps are... and I feel like the next step is basically
> "add a HINT when the server can't start due to being out of disk space
> that one should consider running pg_walcleaner" and at that point... why
> aren't we just, uh, doing that? This is all still quite hand-wavy, but
> it sure would be nice to be able to avoid downtime due to a broken
> archiving setup. pgbackrest has a way of doing this and while we, of
> course, discourage the use of that option, as it means throwing away
> WAL, it's an option that users have. PG could have a similar option.
> Basically, to archive_command/library what max_slot_wal_keep_size is for
> slots.

Thanks. I get your point. The way I see it is that the postgres should
be self-aware of the about-to-get-full disk (probably when the data
directory size is 90%(configurable, of course) of total disk size) and
then freeze the new write operations (may be via new ALTER SYSTEM SET
READ-ONLY or setting default_transaction_read_only GUC) and then go
clean the unneeded WAL files by just invoking pg_walcleaner tool
perhaps. I think, so far, this kind of work has been done outside of
postgres. Even then, we might get into out-of-disk situations
depending on how frequently we check the data directory size to
compute the 90% configurable limit. Detecting the disk size is the KEY
here. Hence we need an offline invokable tool like pg_walcleaner.

Actually, I was planning to write an extension with a background
worker doing this for us.

> That isn't to say that we shouldn't also have a tool like this, but it
> generally feels like we're taking a reactive approach here rather than a
> proactive one to addressing the root issue.

Agree. The offline tool like pg_walcleaner can help greatly even with
some sort of above internal/external disk space monitoring tools.

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-04-18 15:14:48 Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)
Previous Message Tom Lane 2022-04-18 14:53:55 Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)