From: | Nikhil <nikhilsmenon(at)gmail(dot)com> |
---|---|
To: | Craig Ringer <craig(at)2ndquadrant(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: 2 node bdr setup gives error in replication slots |
Date: | 2016-06-14 07:45:45 |
Message-ID: | CALo-6YMVeaibXRom0Te7PWF3BPwdjxYHYf4nemUphZqreXN=eg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I think its caused by hard reboots (may b hyper visor itself is rebooted!)
. Is there any setting which can reduce such problems ?
On Tue, Jun 7, 2016 at 5:30 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 7 June 2016 at 18:24, Nikhil <nikhilsmenon(at)gmail(dot)com> wrote:
>
>> I am getting below error in my 2 node BDR setup. postgres going down. any
>> idea?
>>
>> <35382016-06-07 10:16:59 GMT%LOG: database system was interrupted; last
>> known up at 2016-06-07 09:06:44 GMT
>> <35382016-06-07 10:16:59 GMT%PANIC: replication slot file
>> "pg_replslot/bdr_16389_6293051490331141125_2_16389__/state" has
>> wrong magic 4522536 instead of 17112225
>> <35352016-06-07 10:16:59 GMT%LOG: startup process (PID 3538) was
>> terminated by signal 6: Abort trap
>> <35352016-06-07 10:16:59 GMT%LOG: aborting startup due to startup
>> process failure
>>
>
> That suggests that there was a write failure on the replication slot file.
>
> A simple write error shouldn't be possible because we write the slot file
> to a tempfile, then replace the old slot file with the new one. Filesystem
> issues are possible, or memory corruption in the application that caused a
> bad write. Or a bug, but it's hard to see how we could write the wrong slot
> magic number here.
>
> With the slot corrupted all you can really do is part one of the nodes
> then join a new one.
>
> If you're able to reproduce this I'd really like to see how it came about.
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2016-06-14 12:59:04 | Re: Sequences, txids, and serial order of transactions |
Previous Message | Edson Richter | 2016-06-14 04:51:21 | Re: Index seems "lost" after consecutive deletes |