On Thursday 26 August 2004 16:25, Tom Lane wrote:
> The real point here is that the behavior has to be to default to
> failure, not success. The worst case if we fail incorrectly is that a
> small amount of manual intervention is needed to start the postmaster,
> ie, remove the lockfile and try again. The worst (and very probable)
> case if we succeed incorrectly is extensive, unrecoverable data
> corruption. We must *never* have multiple postmasters running against
> the same data directory. So taking an attitude of "prove that there is
> a working postmaster out there" is quite backwards. You have to think
> in terms of "prove that there isn't".
Of course you're right, as always ;-) Data integrity has to be the absolute
priority, 'twas too early in the morning.
I just got sidetracked because of spurious 'why does our XXX not work?'
questions of people who let their production DB servers (this is for
industrial manufacturing processes, like laser welding) running in the
machine floor, and whose boxes get shot down every now and then, and
sometimes... pg doesn't start. Then the 'small amount of manual intervention'
sometimes is not so small - depending on OS and/or configuration, or even
remote access to the box. Mind you, these are mostly non-computer savvy
people, and those sometimes get upset when 'the system does not startup
correctly' - because that means they can't currently produce a car!
We're working around this by adding a shell script that removes
'postmaster.pid' as last action at system *shutdown*, so we can tell them to
'restart the machine', and everything usually just works fine. But, a
postmaster internal but safe mechanism would be great. Just daydreaming...
Leading SW developer - S.E.A GmbH
In response to
pgsql-hackers-win32 by date
|Next:||From: Bruce Momjian||Date: 2004-08-26 14:56:53|
|Subject: Re: Service startup delay|
|Previous:||From: Andrew Dunstan||Date: 2004-08-26 14:42:32|
|Subject: Re: [PATCHES] postmaster.pid|