Re: Unarchived WALs deleted after crash

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Unarchived WALs deleted after crash
Date: 2013-02-15 17:16:45
Message-ID: CAHGQGwGvPosGdUtcffbKEi0jqe0B50ZPdQiRxqL8kQ4dArNwKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 16, 2013 at 2:07 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 15.02.2013 18:10, Fujii Masao wrote:
>>
>> On Fri, Feb 15, 2013 at 11:31 PM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com> wrote:
>>>>
>>>> - /*
>>>>
>>>> - * Normally we don't delete old XLOG files during recovery to
>>>> - * avoid accidentally deleting a file that looks stale due to a
>>>> - * bug or hardware issue, but in fact contains important data.
>>>> - * During streaming recovery, however, we will eventually fill the
>>>> - * disk if we never clean up, so we have to. That's not an issue
>>>> - * with file-based archive recovery because in that case we
>>>> - * restore one XLOG file at a time, on-demand, and with a
>>>> - * different filename that can't be confused with regular XLOG
>>>> - * files.
>>>> - */
>>>> - if (WalRcvInProgress() || XLogArchiveCheckDone(xlde->d_name))
>>>> + if (RecoveryInProgress() || XLogArchiveCheckDone(xlde->d_name))
>>>> [ delete the file ]
>>>
>>>
>>> With that commit, we started to keep WAL segments restored from the
>>> archive
>>> in pg_xlog, so we needed to start deleting old segments during archive
>>> recovery, even when streaming replication was not active. But the above
>>> change was to broad; we started to delete old segments also during crash
>>> recovery.
>>>
>>> The above should check InArchiveRecovery, ie. only delete old files when
>>> in
>>> archive recovery, not when in crash recovery. But there's one little
>>> complication: InArchiveRecovery is currently only valid in the startup
>>> process, so we'll need to also share it in shared memory, so that the
>>> checkpointer process can access it.
>>>
>>> I propose the attached patch to fix it.
>>
>>
>> At least in 9.2, when the archived file is restored into pg_xlog, its
>> xxx.done
>> archive status file is created. So we don't need to check
>> InArchiveRecovery
>> when deleting old WAL files. Checking whether xxx.done exists is enough.
>
>
> Hmm, what about streamed WAL files? I guess we could go back to the pre-9.2
> coding, and check WalRcvInProgress(). But I didn't actually like that too
> much, it seems rather random that old streamed files are recycled when wal
> receiver is running at the time of restartpoint, and otherwise not. Because
> whether wal receiver is running at the time the restartpoint happens has
> little to do with which files were created by streaming replication. With
> the right pattern of streaming files from the master, but always being
> teporarily disconnected when the restartpoint runs, you could still
> accumulate WAL files infinitely.

Walreceiver always creates .done file when it closes the
already-flushed WAL file
and switches WAL file to next. So we also don't need to check
WalRcvInProgress().

>> Unfortunately in HEAD, xxx.done file is not created when restoring
>> archived
>> file because of absence of the patch. We need to implement that first.
>
>
> Ah yeah, that thing again..
> (http://www.postgresql.org/message-id/50DF5BA7.6070200@vmware.com) I'm going
> to forward-port that patch now, before it's forgotten again. It's not clear
> to me what the holdup was on this, but whatever the bigger patch we've been
> waiting for is, it can just as well be done on top of the forward-port.

I posted the patch to that thread.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-02-15 17:25:45 Re: JSON Function Bike Shedding
Previous Message Bruce Momjian 2013-02-15 17:12:03 Re: src/ports/pgcheckdir.c - Ignore dot directories...