From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Christoph Berg <christoph(dot)berg(at)credativ(dot)de>, Josh Berkus <josh(at)agliodbs(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [BUG] Archive recovery failure on 9.3+. |
Date: | 2014-02-13 09:46:16 |
Message-ID: | 52FC9468.4050602@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02/12/2014 01:24 PM, Christoph Berg wrote:
> Re: Heikki Linnakangas 2014-01-13 <52D3CAFF(dot)3010807(at)vmware(dot)com>
>>>> Actually, why is the partially-filled 000000010000000000000002 file
>>>> archived in the first place? Looking at the code, it's been like that
>>>> forever, but it seems like a bad idea. If the original server is still
>>>> up and running, and writing more data to that file, what will happen is
>>>> that when the original server later tries to archive it, it will fail
>>>> because the partial version of the file is already in the archive. Or
>>>> worse, the partial version overwrites a previously archived more
>>>> complete version.
>>>
>>> Oh! This explains some transient errors I've seen.
>>>
>>>> Wouldn't it be better to not archive the old segment, and instead switch
>>>> to a new segment after writing the end-of-recovery checkpoint, so that
>>>> the segment on the new timeline is archived sooner?
>>>
>>> It would be better to zero-fill and switch segments, yes. We should
>>> NEVER be in a position of archiving two different versions of the same
>>> segment.
>>
>> Ok, I think we're in agreement that that's the way to go for master.
>>
>> Now, what to do about back-branches? On one hand, I'd like to apply
>> the same fix to all stable branches, as the current behavior is
>> silly and always has been. On the other hand, we haven't heard any
>> complaints about it, so we probably shouldn't fix what ain't broken.
>> Perhaps we should apply it to 9.3, as that's where we have the acute
>> problem the OP reported. Thoughts?
>>
>> In summary, I propose that we change master and REL9_3_STABLE to not
>> archive the partial segment from previous timeline. Older branches
>> will keep the current behavior.
>
> I've seen the "can't archive file from the old timeline" problem on
> 9.2 and 9.3 slaves after promotion. The problem is in conjunction with
> the proposed archive_command in the default postgresql.conf comments:
>
> # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
>
> With 9.1, it works, but 9.2 and 9.3 don't archive anything until I
> remove the "test ! -f" part. (An alternative fix would be to declare
> the behavior ok and adjust that example in the config.)
Hmm, the behavior is the same in 9.1 and 9.2. Did you use a different
archive_command in 9.1, without the "test"?
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2014-02-13 09:47:35 | Re: how set GUC_check_errhint_string in call_string_check_hook() |
Previous Message | amul sul | 2014-02-13 08:59:39 | how set GUC_check_errhint_string in call_string_check_hook() |