Re: PostgreSQL and Windows 2003 DFS Replication

From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Arnaud Lesauvage <thewild(at)freesurf(dot)fr>, Postgres general mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL and Windows 2003 DFS Replication
Date: 2006-07-31 08:35:26
Message-ID: 1154334926.22367.189.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 2006-07-28 at 22:30, Merlin Moncure wrote:
> On 7/28/06, Arnaud Lesauvage <thewild(at)freesurf(dot)fr> wrote:
> > Csaba Nagy wrote:
> > > I found that PITR using WAL shipping is not protecting against all
> > > failure scenarios... it sure will help if the primary machine's hardware
> > > fails, but in one case it was useless for us: the primary had a linux
> > > kernel with buggy XFS code (that's what I think it was, cause we never
> > > found out for sure) and we did use XFS for the data partition, and at
> > > one point it started to get corruptions at the data page level. The
> > > corruption was promptly transferred to the standby, and therefore it was
> > > also unusable... we had to recover from a backup, with the related
> > > downtime. Not good for business...
> > >
> > OK, but corruption at the data page level is a very unlikely
> > event, isn't it ?

It's not... it just happened to me again, strangely this time on a Slony
replica. It might be that the hardware/OS/FS combination we use is the
problem, might be that postgres has some problem with those (I would
exclude slony being able to produce such things). But it did happened,
and I can't exclude it will happen again. This time I'll be able to
investigate closer I hope.

> yes, and that is not a pitr problem, that is a data corruption
> problem. i am very suspicious that slony style replication would
> provide any sort of defense against replicating from a machine which
> is changing bytes from a to b, etc. i think the best defense against
> *that* sort of problem would be synchronous replication via pgpool.

When it happened for us, it was a few blocks in some tables, and I
suspect it was a OS/FS bug. In that case slony would not propagate the
error, it might propagate bad data, but not the page error itself. So it
might not protect against bad data, but I will be able to switch over
and have a working system immediately compared to recover from a backup
from yesterday after a downtime of 8 hours. So instead of loosing data
worth of 1 day and have a downtime of 8 hours I'll have a downtime of 1
minute and have a few bad entries in the DB... for the kind of
application we have here it is definitely a better scenario.

Cheers,
Csaba.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-07-31 08:53:51 Re: Questions about update, delete, ctid...
Previous Message Jasbinder Bali 2006-07-31 07:53:20 Triggers in Postgres