Re: pg 8.3 replication causing corruption

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Bob Hatfield <bobhatfield(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg 8.3 replication causing corruption
Date: 2011-10-14 19:29:56
Message-ID: CAHyXU0z+Sjm1kWUEPhAOKs_a9a=j47DZ7ViLEtvg=svVH4Znaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Oct 13, 2011 at 4:20 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Thu, Oct 13, 2011 at 4:07 PM, Bob Hatfield <bobhatfield(at)gmail(dot)com> wrote:
>>> have you had any power events?  hard shutdowns, etc? I wonder if the problem is in the clog files, and not the heap itself.
>>
>> Nothing unusual for as long as I can tell.  Reminder that as long as I
>> don't restart the primary's pg process, everything works fine
>> (secondary's data is intact).
>>
>> It's as if stopping/starting the primary causes a shipped wal file to
>> be corrupt or contain duplicated data then processed by the secondary.
>
> My money is on clog/visibility  related issues.  It's a bit of a bear,
> but can you pull the xmin/xmax/ctid for the two duplicate records on
> the standby and the correspondingly non-duplicated record on the
> master?  I'm curious if the heap blocks are identical and if the
> standby is incorrectly marking a transaction as valid/invalid.
>
> From there,
>
> We need to:
> *) figure out the transaction bits in clog on both systems and look
> them up there.
> *) also, look for differences in clog generally
> *) digest the heap block containing the records to see if they are identical
> *) double check hint bits?

Any movement on this? There is considerable interest in any known
issues resolving reproducible issues with postgres replication. Do
you happen to remember if set up the standby when the master was under
high load conditions? Any interesting/unexplained messages in the
standby logs?

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2011-10-14 19:29:58 Re: Slow query: select * order by XXX desc offset 10 limit 10
Previous Message Merlin Moncure 2011-10-14 19:27:22 Re: could not reattach to shared memory