Re: unable to fail over to warm standby server

From: Mason Hale <mason(at)onespot(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-bugs(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: unable to fail over to warm standby server
Date: 2010-01-29 14:49:51
Message-ID: 1e85dd391001290649p3abc47b6s9ed3ec312d793076@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> On Fri, Jan 29, 2010 at 12:03 AM, Mason Hale <mason(at)onespot(dot)com> wrote:
> > Of course the best solution is to avoid this issue entirely. Something as
> > easy to miss as file permissions should not cause data corruption,
> > especially in the process meant to fail over from a crashing primary
> > database.
>
> I believe that such a file permission problem does nothing but
> shut down the standby by a FATAL error, and wouldn't cause data
> corruption. So if you remove the trigger file with a wrong
> permission after the shutdown, you can restart a recovery well
> by just starting the standby postgres.
>
>
Perhaps my wording of "data corruption" was too harsh?

While I did not remove the trigger file, I did rename recovery.conf to
recovery.conf.old.

That file contained the recovery_command configuration that identified the
trigger file. So that rename should have eliminated the problem. But it
didn't. Even after making this change and taking the trigger file out of the
equation my database failed to come online.

Maybe that wasn't data corruption. Maybe the issue was repairable. I just
know that with my 3+ years of experience working with Postgres and the help
of the #postgresql IRC channel, I was not able to revive the database at a
time when I desperately needed it to work. The failover process failed for
me at the worst possible time.

I will surely be careful about trigger file permissions in the future. I
just shared my experience so that future DBA's who might make the same
mistake in a similar situation don't have to deal with the same unexpected
results.

- Mason

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Fujii Masao 2010-01-29 16:02:26 Re: unable to fail over to warm standby server
Previous Message aurelien 2010-01-29 12:44:34 BUG #5300: Bug on Mac OS X 10.6 and Postgres 8.4