Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)

From: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
To: David Powers <dpowers(at)janestreet(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
Date: 2013-05-21 15:59:26
Message-ID: CADbMkNMurWJMUmXAKqtFi1p40=G0ncVvOKXjnixhX5Bjb4-8BQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We are seeing these errors on a regular basis on the testing box now. We
have even changed the backup script to
shutdown the hot standby, take lvm snapshot, restart the hot standby, rsync
the lvm snapshot. It still happens.

We have never seen this before we introduced the hot standby. So we will
now revert to taking the backups from lvm snapshots on the production
database. If you have ideas of what else we should try / what information
we can give you to debug this let us know and we will try to so.

Until then we will sadly operate on the assumption that the combination of
hot standby and "frozen snapshot" backup of it is not production ready.

Thanks,

Bene

On Thu, May 16, 2013 at 8:10 AM, David Powers <dpowers(at)janestreet(dot)com>wrote:

> I'll try to get the primary upgraded over the weekend when we can afford a
> restart.
>
> In the meantime I have a single test showing that a shutdown, snapshot,
> restart produces a backup that passes the vacuum analyze test. I'm going
> to run a full vacuum today.
>
> -David
>
>
> On Wed, May 15, 2013 at 3:53 PM, Heikki Linnakangas <
> hlinnakangas(at)vmware(dot)com> wrote:
>
>> On 15.05.2013 22:50, Benedikt Grundmann wrote:
>>
>>> On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangas<hlinnakangas(at)**
>>> vmware.com <hlinnakangas(at)vmware(dot)com>
>>>
>>>> The subject says 9.2.3. Are you sure you're running 9.2.4 on all the
>>>>
>>>> servers? There was a fix to a bug related to starting a standby server
>>>> from
>>>> a filesystem snapshot. I don't think it was quite the case you have, but
>>>> pretty close.
>>>>
>>>
>>> So this is delightfully embarrassing I just went back to double check and
>>>
>>> - primary box is 9.2.3
>>> - standby is 9.2.4
>>> - testing is 9.2.4
>>>
>>> I guess that alone could possibly explain it?
>>>
>>
>> Hmm, no, it should still work. There haven't been any changes in the WAL
>> format. I do recommend upgrading the primary, of course, but I don't really
>> see how that would explain what you're seeing.
>>
>> - Heikki
>>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2013-05-21 18:16:53 pg_export_snapshot on standby side
Previous Message Simon Riggs 2013-05-21 15:02:51 Re: fast promotion and log_checkpoints