Re: 2 node bdr setup gives error in replication slots

From: Nikhil <nikhilsmenon(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: 2 node bdr setup gives error in replication slots
Date: 2016-06-14 07:45:45
Message-ID: CALo-6YMVeaibXRom0Te7PWF3BPwdjxYHYf4nemUphZqreXN=eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I think its caused by hard reboots (may b hyper visor itself is rebooted!)
. Is there any setting which can reduce such problems ?

On Tue, Jun 7, 2016 at 5:30 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On 7 June 2016 at 18:24, Nikhil <nikhilsmenon(at)gmail(dot)com> wrote:
>
>> I am getting below error in my 2 node BDR setup. postgres going down. any
>> idea?
>>
>> <35382016-06-07 10:16:59 GMT%LOG: database system was interrupted; last
>> known up at 2016-06-07 09:06:44 GMT
>> <35382016-06-07 10:16:59 GMT%PANIC: replication slot file
>> "pg_replslot/bdr_16389_6293051490331141125_2_16389__/state" has
>> wrong magic 4522536 instead of 17112225
>> <35352016-06-07 10:16:59 GMT%LOG: startup process (PID 3538) was
>> terminated by signal 6: Abort trap
>> <35352016-06-07 10:16:59 GMT%LOG: aborting startup due to startup
>> process failure
>>
>
> That suggests that there was a write failure on the replication slot file.
>
> A simple write error shouldn't be possible because we write the slot file
> to a tempfile, then replace the old slot file with the new one. Filesystem
> issues are possible, or memory corruption in the application that caused a
> bad write. Or a bug, but it's hard to see how we could write the wrong slot
> magic number here.
>
> With the slot corrupted all you can really do is part one of the nodes
> then join a new one.
>
> If you're able to reproduce this I'd really like to see how it came about.
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Kevin Grittner 2016-06-14 12:59:04 Re: Sequences, txids, and serial order of transactions
Previous Message Edson Richter 2016-06-14 04:51:21 Re: Index seems "lost" after consecutive deletes