Re: Unarchived WALs deleted after crash

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Unarchived WALs deleted after crash
Date: 2013-02-15 17:07:43
Message-ID: 511E6B5F.6050709@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15.02.2013 18:10, Fujii Masao wrote:
> On Fri, Feb 15, 2013 at 11:31 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>>> - /*
>>> - * Normally we don't delete old XLOG files during recovery to
>>> - * avoid accidentally deleting a file that looks stale due to a
>>> - * bug or hardware issue, but in fact contains important data.
>>> - * During streaming recovery, however, we will eventually fill the
>>> - * disk if we never clean up, so we have to. That's not an issue
>>> - * with file-based archive recovery because in that case we
>>> - * restore one XLOG file at a time, on-demand, and with a
>>> - * different filename that can't be confused with regular XLOG
>>> - * files.
>>> - */
>>> - if (WalRcvInProgress() || XLogArchiveCheckDone(xlde->d_name))
>>> + if (RecoveryInProgress() || XLogArchiveCheckDone(xlde->d_name))
>>> [ delete the file ]
>>
>> With that commit, we started to keep WAL segments restored from the archive
>> in pg_xlog, so we needed to start deleting old segments during archive
>> recovery, even when streaming replication was not active. But the above
>> change was to broad; we started to delete old segments also during crash
>> recovery.
>>
>> The above should check InArchiveRecovery, ie. only delete old files when in
>> archive recovery, not when in crash recovery. But there's one little
>> complication: InArchiveRecovery is currently only valid in the startup
>> process, so we'll need to also share it in shared memory, so that the
>> checkpointer process can access it.
>>
>> I propose the attached patch to fix it.
>
> At least in 9.2, when the archived file is restored into pg_xlog, its xxx.done
> archive status file is created. So we don't need to check InArchiveRecovery
> when deleting old WAL files. Checking whether xxx.done exists is enough.

Hmm, what about streamed WAL files? I guess we could go back to the
pre-9.2 coding, and check WalRcvInProgress(). But I didn't actually like
that too much, it seems rather random that old streamed files are
recycled when wal receiver is running at the time of restartpoint, and
otherwise not. Because whether wal receiver is running at the time the
restartpoint happens has little to do with which files were created by
streaming replication. With the right pattern of streaming files from
the master, but always being teporarily disconnected when the
restartpoint runs, you could still accumulate WAL files infinitely.

> Unfortunately in HEAD, xxx.done file is not created when restoring archived
> file because of absence of the patch. We need to implement that first.

Ah yeah, that thing again..
(http://www.postgresql.org/message-id/50DF5BA7.6070200@vmware.com) I'm
going to forward-port that patch now, before it's forgotten again. It's
not clear to me what the holdup was on this, but whatever the bigger
patch we've been waiting for is, it can just as well be done on top of
the forward-port.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2013-02-15 17:09:52 Re: Prevent restored WAL files from being archived again Re: Unnecessary WAL archiving after failover
Previous Message Fujii Masao 2013-02-15 16:10:38 Re: Unarchived WALs deleted after crash