Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
Cc: David Powers <dpowers(at)janestreet(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
Date: 2013-05-23 15:26:35
Message-ID: CA+TgmobWPC3_NOmnpoFhxFvnpgHLPWXi8OG5xD9tNusGGB1mzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 21, 2013 at 11:59 AM, Benedikt Grundmann
<bgrundmann(at)janestreet(dot)com> wrote:
> We are seeing these errors on a regular basis on the testing box now. We
> have even changed the backup script to
> shutdown the hot standby, take lvm snapshot, restart the hot standby, rsync
> the lvm snapshot. It still happens.
>
> We have never seen this before we introduced the hot standby. So we will
> now revert to taking the backups from lvm snapshots on the production
> database. If you have ideas of what else we should try / what information
> we can give you to debug this let us know and we will try to so.
>
> Until then we will sadly operate on the assumption that the combination of
> hot standby and "frozen snapshot" backup of it is not production ready.

I'm pretty suspicious that your backup procedure is messed up in some
way. The fact that you got invalid page headers is really difficult
to attribute to a PostgreSQL bug. A number of the other messages that
you have posted also tend to indicate either corruption, or that WAL
replay has stopped early. It would be interesting to see the logs
from when the clone was first started up, juxtaposed against the later
WAL flush error messages.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Atri Sharma 2013-05-23 15:26:36 Re: Time limit for a process to hold Content lock in Buffer Cache
Previous Message Amit Langote 2013-05-23 15:22:42 Re: Time limit for a process to hold Content lock in Buffer Cache