Re: Automatic cleanup of oldest WAL segments with pg_receivexlog

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Automatic cleanup of oldest WAL segments with pg_receivexlog
Date: 2017-02-24 00:29:54
Message-ID: CAB7nPqTs4V8W-SL9+wcsVdovUba0n2HVrNi+sKeHbxz_nao8DQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 23, 2017 at 10:54 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Micahel,
>
> * Michael Paquier (michael(dot)paquier(at)gmail(dot)com) wrote:
>> Is there any interest for a feature like that? I have a non-polished
>> patch at hand but I can work on that for the upcoming CF if there are
>> voices in favor of such a feature. The feature could be simply
>> activated with a dedicated switch, like --clean-oldest-wal,
>> --clean-tail-wal, or something like that.
>
> This sounds interesting, though I wouldn't base it on the amount of free
> space on the partition but rather some user-set value (eg:
> --max-archive-size=50GB or something).

Doable. This part is not that hard to use for pg_receivexlog kicked as
a service if the parameter passed is an environment variable.

> I am a bit dubious about it in general though. WAL that you don't have
> a base backup for or a replica which needs it is really of very limited
> value. I understand your suggestion that it could be used for
> 'debugging', but that really seems like a stretch to me.

Putting your hands on what happens in the database at page level
helps. I have used that once to look at page-level data to see that
the page actually got corrupted by an incorrect failover flow (page
got flushed and this node was incorrectly reused as a standby
afterwards).

> I would also
> be concerned that people would set up their systems using this without
> fully understanding it, or being prepared to handle what happens when it
> kicks in and starts removing WAL that maybe they should have kept for a
> base backup or a replica. At least if we start failing when the
> partition is full then they have alerts telling them that the partition
> is full and they have a base backup and WAL to bring it forward to
> almost current.

Sure. Documentation is really key here. No approaches are perfect,
each one has its own value. I am of course not suggesting to make any
of that enabled. In my case not getting a failure because of a full
partition mattered more.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-02-24 01:17:23 Re: Range Partitioning behaviour - query
Previous Message Andrew Dunstan 2017-02-24 00:20:32 Re: btree_gin and btree_gist for enums