Re: Odd corruption issue reported on dba.stackexchange.com, need advice

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Marcin Mańk <marcin(dot)mank(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Odd corruption issue reported on dba.stackexchange.com, need advice
Date: 2012-07-24 14:45:16
Message-ID: 21697.1343141116@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> writes:
> On Tue, Jul 24, 2012 at 7:48 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
>>> # if that still not helps, use the big hammer
>>> if (-f $info{'pgdata'}.'/postmaster.pid') {
>>> print "(does not shutdown, killing the process)";
>>> $pid = get_running_pid $info{'pgdata'}.'/postmaster.pid';
>>> kill (9, $pid) if $pid;
>>> unlink $info{'pgdata'}.'/postmaster.pid';
>>> $result = 0;
>>> }

>> Could the "big hammer mode" be what's killed the database?

> Yes PG should theoretically survive be able to survive anything as
> long as fsync is being properly honored.

I will tell you what is horridly, horridly dangerous and stupid about
that script, and it's not the kill -9 on the postmaster. It's the
forced unlink on the postmaster.pid file, which (a) is entirely
unnecessary, and (b) defeats the safety interlock against starting
a new postmaster before all the old backends have flushed out.

Postgres will survive a postmaster kill just fine; that scenario
gets exercised fairly regularly, because of the Linux OOM killer :-(.
It will not survive having two independent sets of backends scribbling
on the same database, but that's what this script opens you up to.
If you ever used the "big hammer" and then started a new postmaster
before being entirely sure all the old postmaster's child processes
were gone, then that's why you have a corrupt database.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Dmitriy Igrishin 2012-07-24 14:48:02 Re: Roles with empty password (probably bug in libpq and in psql as well).
Previous Message Mark Wynter 2012-07-24 14:39:52 Problem using a pl/pgsql function to populate a geometry column with x, y data