Skip site navigation (1) Skip section navigation (2)

Re: Odd corruption issue reported on dba.stackexchange.com, need advice

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Marcin Mańk <marcin(dot)mank(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Odd corruption issue reported on dba.stackexchange.com, need advice
Date: 2012-07-24 14:45:16
Message-ID: 21697.1343141116@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-general
Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> writes:
> On Tue, Jul 24, 2012 at 7:48 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
>>>	# if that still not helps, use the big hammer
>>>	if (-f $info{'pgdata'}.'/postmaster.pid') {
>>>		print "(does not shutdown, killing the process)";
>>>		$pid = get_running_pid $info{'pgdata'}.'/postmaster.pid';
>>>		kill (9, $pid) if $pid;
>>>		unlink $info{'pgdata'}.'/postmaster.pid';
>>>		$result = 0;
>>>	}

>> Could the "big hammer mode" be what's killed the database?

> Yes PG should theoretically survive be able to survive anything as
> long as fsync is being properly honored.

I will tell you what is horridly, horridly dangerous and stupid about
that script, and it's not the kill -9 on the postmaster.  It's the
forced unlink on the postmaster.pid file, which (a) is entirely
unnecessary, and (b) defeats the safety interlock against starting
a new postmaster before all the old backends have flushed out.

Postgres will survive a postmaster kill just fine; that scenario
gets exercised fairly regularly, because of the Linux OOM killer :-(.
It will not survive having two independent sets of backends scribbling
on the same database, but that's what this script opens you up to.
If you ever used the "big hammer" and then started a new postmaster
before being entirely sure all the old postmaster's child processes
were gone, then that's why you have a corrupt database.

			regards, tom lane

In response to

pgsql-general by date

Next:From: Dmitriy IgrishinDate: 2012-07-24 14:48:02
Subject: Re: Roles with empty password (probably bug in libpq and in psql as well).
Previous:From: Mark WynterDate: 2012-07-24 14:39:52
Subject: Problem using a pl/pgsql function to populate a geometry column with x,y data

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group