Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>
Subject: Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
Date: 2022-04-25 18:39:57
Message-ID: 20220425183957.GE10577@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Bharath Rupireddy (bharath(dot)rupireddyforpostgres(at)gmail(dot)com) wrote:
> On Mon, Apr 18, 2022 at 8:48 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Bharath Rupireddy (bharath(dot)rupireddyforpostgres(at)gmail(dot)com) wrote:
> > > On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > > * Bharath Rupireddy (bharath(dot)rupireddyforpostgres(at)gmail(dot)com) wrote:
> > > > > Thanks for the comments. Here's a new tool called pg_walcleaner which
> > > > > basically deletes (optionally archiving before deletion) the unneeded
> > > > > WAL files.
> > > > >
> > > > > Please provide your thoughts and review the patches.
> > > >
> > > > Alright, I spent some more time thinking about this and contemplating
> > > > what the next steps are... and I feel like the next step is basically
> > > > "add a HINT when the server can't start due to being out of disk space
> > > > that one should consider running pg_walcleaner" and at that point... why
> > > > aren't we just, uh, doing that? This is all still quite hand-wavy, but
> > > > it sure would be nice to be able to avoid downtime due to a broken
> > > > archiving setup. pgbackrest has a way of doing this and while we, of
> > > > course, discourage the use of that option, as it means throwing away
> > > > WAL, it's an option that users have. PG could have a similar option.
> > > > Basically, to archive_command/library what max_slot_wal_keep_size is for
> > > > slots.
> > >
> > > Thanks. I get your point. The way I see it is that the postgres should
> > > be self-aware of the about-to-get-full disk (probably when the data
> > > directory size is 90%(configurable, of course) of total disk size) and
> > > then freeze the new write operations (may be via new ALTER SYSTEM SET
> > > READ-ONLY or setting default_transaction_read_only GUC) and then go
> > > clean the unneeded WAL files by just invoking pg_walcleaner tool
> > > perhaps. I think, so far, this kind of work has been done outside of
> > > postgres. Even then, we might get into out-of-disk situations
> > > depending on how frequently we check the data directory size to
> > > compute the 90% configurable limit. Detecting the disk size is the KEY
> > > here. Hence we need an offline invokable tool like pg_walcleaner.
> >
> > Ugh, last I checked, figuring out if a given filesystem is near being
> > full is a pain to do in a cross-platform way. Why not just do exactly
> > what we already are doing for replication slots, but for
> > archive_command?
>
> Do you mean to say that if the archvie_command fails, say, for "some
> time" or "some number of attempts", just let the server not bother
> about it and checkpoint delete the WAL files instead of going out of
> disk? If this is the thought, then it's more dangerous as we might end
> up losing the WAL forever. For invalidating replication slots, it's
> okay because the required WAL can exist somewhere (either on the
> primary or on the archive location).

I was thinking more specifically along the lines of "if there's > X GB
of WAL that hasn't been archived, give up on archiving anything new"
(which is how the pgbackrest option works).

As archiving with this command is optional, it does present the same
risk too. Perhaps if we flipped it around to require the
archive_command be provided then it'd be a bit better, though we would
also need a way for users to ask for us to just delete the WAL without
archiving it. There again though ... the server already has a way of
both archiving and removing archived WAL and also has now grown the
archive_library option, something that this tool would be pretty hard to
replicate, I feel like, as it wouldn't be loading the library into a PG
backend anymore. As we don't have any real archive libraries yet, it's
hard to say if that's going to be an actual issue or not. Something to
consider though.

> > > > That isn't to say that we shouldn't also have a tool like this, but it
> > > > generally feels like we're taking a reactive approach here rather than a
> > > > proactive one to addressing the root issue.
> > >
> > > Agree. The offline tool like pg_walcleaner can help greatly even with
> > > some sort of above internal/external disk space monitoring tools.
> >
> > See, this seems like a really bad idea to me. I'd be very concerned
> > about people mis-using this tool in some way and automating its usage
> > strikes me as absolutely exactly that.. Are we sure that we can
> > guarantee that we don't remove things we shouldn't when this ends up
> > getting run against a running cluster from someone's automated tooling?
> > Or when someone sees that it refuses to run for $reason and tries to..
> > "fix" that? Seems quite risky to me.. I'd probably want to put similar
> > caveats around using this tool as I do around pg_resetwal when doing
> > training- that is, don't ever, ever, ever use this, heh.
>
> The initial version of the patch doesn't check if the server crashed
> or not before running it. I was thinking of looking at the
> postmaster.pid or pg_control file (however they don't guarantee
> whether the server is up or crashed because the server can crash
> without deleting postmaster.pid or updating pg_control file). Another
> idea is to let pg_walcleaner fire a sample query ('SELECT 1') to see
> if the server is up and running, if yes, exit, otherwise proceed with
> its work.

All of which isn't an issue if we don't have an external tool trying to
do this and instead have the server doing it as the server knows its
internal status, that the archive command has been failing long enough
to pass the configuration threshold, and that the WAL isn't needed for
crash recovery. The ability to avoid having to crash and go through
that process is pretty valuable. Still, a crash may still happen and
it'd be useful to have a clean way to deal with it. I'm not really a
fan of having to essentially configure this external command as well as
have the server configured. Have we settled that there's no way to make
the server archive while there's no space available and before trying to
write out more data?

> Also, to not cause losing of WAL permanently, we must recommend using
> archvie_command so that the WAL can be moved to an alternative
> location (could be the same archvie_location that primary uses).

I agree we should recommend using archive_command or archive_library, of
course, but if that's been done and is working properly then this tool
isn't really needed. The use-case we're trying to address, I believe,
is something like:

1) archive command starts failing for some reason
2) WAL piles up on the primary
3) primary runs out of disk space, crash happens
4) archive command gets 'fixed' in some fashion
5) WAL is archived and removed from primary
6) primary is restarted and able to go through crash recovery
7) server is online again

Now, I was trying to suggest an approach to addressing the issue at #2,
that is, avoid having WAL pile up without end on the primary and avoid
the crash in the first place. For users who care more about uptime and
less about WAL, that's likely what they want.

For users who care more about WAL than uptime, it'd be good to have a
way to help them too, but to do that, #4 has to happen and, once that's
done, #5 and following just need to be accomplished in whatever way is
simplest. The thought I'm having here is that the simplest answer, at
least from the user's perspective, is that the server is able to just be
brought up with the fixed archive command and everything just works-
archiving happens, space is free'd up, and the server comes up and
continues running.

I accept that it isn't this patch's job or mandate to go implement some
new option that I've thought up, and I could be convinced that this
separate tool is just how we're going to have to get #5 accomplished for
now due to the complexity of making the server do the archiving early on
and/or that it has other downsides (if the crash wasn't due to running
out of space, making the server wait to come online until after the WAL
has been archived wouldn't be ideal) that make it a poor choice overall,
but it seems like it's something that's at least worth some thought and
consideration of if there's a way to accomplish this with a bit less
manual user involvement, as that tends to be error-prone.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-04-25 18:57:12 Re: pgsql: Allow db.schema.table patterns, but complain about random garbag
Previous Message Tom Lane 2022-04-25 18:31:48 Re: WIP: WAL prefetch (another approach)