Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Date: 2014-10-27 16:12:26
Message-ID: 544E6EEA.4080204@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/27/2014 02:12 PM, Fujii Masao wrote:
> On Fri, Oct 24, 2014 at 10:05 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> On 10/23/2014 11:09 AM, Heikki Linnakangas wrote:
>>>
>>> At least for master, we should consider changing the way the archiving
>>> works so that we only archive WAL that was generated in the same server.
>>> I.e. we should never try to archive WAL files belonging to another
>>> timeline.
>>>
>>> I just remembered that we discussed a different problem related to this
>>> some time ago, at
>>>
>>> http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp.
>>> The conclusion of that was that at promotion, we should not archive the
>>> last, partial, segment from the old timeline.
>>
>>
>> So, this is what I came up with for master. Does anyone see a problem with
>> it?
>
> What about the problem that I raised upthread? This is, the patch
> prevents the last, partial, WAL file of the old timeline from being archived.
> So we can never PITR the database to the point that the last, partial WAL
> file has.

A partial WAL file is never archived in the master server to begin with,
so if it's ever used in archive recovery, the administrator must have
performed some manual action to copy the partial WAL file from the
original server. When he does that, he can also copy it manually to the
archive, or whatever he wants to do with it.

Note that the same applies to any complete, but not-yet archived WAL
files. But we've never had any mechanism in place to archive those in
the new instance, after PITR.

> Isn't this problem? For example, please imagine the
> following scenario.
>
> 1. The important data was deleted but no one noticed that. This deletion was
> logged in last, partial WAL file.
> 2. The server crashed and DBA started an archive recovery from old backup.
> 3. After recovery, all WAL files of the old timeline were recycled.
> 4. Finally DBA noticed the loss of important data and tried to do PITR
> to the point
> where the data was deleted.
>
> HOWEVER, the WAL file containing that deletion operation no longer exists.
> So DBA will never be able to recover that important data ....

I think you're missing a step above:

1.5: The administrator copies the last, partial WAL file (and any
complete but not yet-archived files) to the new server's pg_xlog directory.

Without that, it won't be available for PITR anyway, and the new server
won't see it or try to archive it, no matter what.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-10-27 16:24:25 Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Previous Message Heikki Linnakangas 2014-10-27 16:02:59 Re: What exactly is our CRC algorithm?