Re: Automatic cleanup of oldest WAL segments with pg_receivexlog

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Automatic cleanup of oldest WAL segments with pg_receivexlog
Date: 2017-02-23 16:10:39
Message-ID: CABUevEwCE=R1rjLYcwJMUoz=EnBFVhT-1TXervFpe8K09s8GdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 23, 2017 at 7:36 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> Hi all,
>
> When storing WAL segments on a dedicated partition with
> pg_receivexlog, for some deployments, the removal of past WAL segments
> depends on the frequency of base backups happening on the server. In
> short, once a new base backup is taken, it may not be necessary to
> keep around those past WAL segments. Of course a lot of things in this
> area depend on the retention policy of the deployments. Still, if a
> base backup kicks in rather late, there is a risk to have
> pg_receivexlog fail because a full disk in the middle of a segment.
> Using a replication slot, this would cause Postgres to get down
> because of a bloated pg_wal. I am not talking about such cases here :)
>
> On some other types of deployments I work on, one or more standbys are
> waiting behind to take over the cluster in case of a failure of the
> primary. In such cases, cold backups are still taken and saved
> separately. Those are self-contained and can be restored independently
> on the rest to put the cluster back on track using a past state.
> Archiving using pg_receivexlog still happens to allow the standbys to
> catch up if they get offline for a long period of time, something that
> may be caused by an outage or simply by the cloning of a new VM that
> can take minutes or dozens of minutes to finish deployment. This new
> VM can be as well an atomic copy of the primary ready to be used as a
> standby. As the primary server may have already recycled the oldest
> wal segments in its pg_wal after two checkpoints, archiving plays an
> important role in being sure that things can replay successfully. In
> short what matters is that the VM cloning does not take longer than 2
> checkpoints, but there is no guarantee that the cloning would finish
> on time.
>
> In short for such deployments, and contrary to the type of the first
> paragraph, we don't care actually care about the past WAL segments, we
> do care more about the newest ones (well, mainly about segments older
> than the last 2 checkpoints to be correct still having the newest
> segments at hand makes replay faster with restore_command with a local
> archive). In such cases, I have found useful the possibility to
> automatically remove the past WAL segments from the archive partition
> if it gets full and allow archiving to move on to the latest data even
> if a new base backup has not kicked in to make the past segments
> useless.
>
> The idea is really simple, in order to keep the newest WAL history as
> large as possible (it is useful for debuggability purposes to exploit
> as much as possible the archive partition), we look at the amount free
> space available on the partition of the archives when switching to the
> next segment, and simply remove as much data as needed to save space
> worth one complete segment. Note that compressed WAL segments this may
> be several segments removed at once as there is no way to be sure how
> much compression will save. The central point of this reasoning is
> really to do the decision-making within pg_receivexlog itself as it is
> the only point where we know that a new segment is created. So this
> makes the cleanup logic independent on the load of Postgres itself.
>
> On any non-Windows systems, statvfs() would be enough to get the
> amount of free space available on a partition and it is
> posix-compliant. For Windows, there is GetDiskFreeSpace() available.
>
> Is there any interest for a feature like that? I have a non-polished
> patch at hand but I can work on that for the upcoming CF if there are
> voices in favor of such a feature. The feature could be simply
> activated with a dedicated switch, like --clean-oldest-wal,
> --clean-tail-wal, or something like that.
>

I'm not sure this logic belongs in pg_receivexlog. If we put the decision
making there, then we lock ourselves into one "type of policy".

Wouldn't this one, along with some other scenarios, be better provided by
the "run command at end of segment" function that we've talked about
before? And then that external command could implement whatever aging logic
would be appropriate for the environment?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-02-23 16:30:03 Re: Documentation improvements for partitioning
Previous Message Simon Riggs 2017-02-23 16:08:45 Re: Documentation improvements for partitioning