Re: postmaster crash

From: Richard Huxton <dev(at)archonet(dot)com>
To: Steve Oualline <soualline(at)stbernard(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postmaster crash
Date: 2006-02-01 09:31:37
Message-ID: 43E07FF9.50600@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Steve Oualline wrote:
> We have an interesting problem here. We have a server at a customer's site
> on which the database will not come up. Because of the nature of the
> product we make, we don't turn on Postgresql logs, so no log data
> is avaliable.

That's the biggest problem you've got right there.

> What we see is that when we start postmaster it starts, but anyone who
> tries to connect gets a message "FATAL: Database is starting up".
> This continues for about 5 minutes at which point a watchdog process
> kills the postmaster (SIGQUIT) and restarts it. This process is
> repeating itself over and over again on the system.
>
> In a attempt to find this problem the watchdog process is killed and
> postmaster is started manually. The results are:
>
> LOG: could not load root certificate file "/slice2/url_db/root.crt": No such file or directory
> DETAIL: Will not verify client certificates.
> LOG: database system was interrupted while in recovery at 2006-01-24 12:32:10 PST
> HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.

This is probably just where your watchdog process killed the last restart.

> One other clue is available. There are 1202 files in the pg_xlog directory.

And they are all called something like 0000...1234 and 16MB in size?
Because that seems like a lot of transaction-log files to have lying about.

> One thought is that we should shutdown the database with a SIGINT instead of a SIGQUIT.
> It should be noted however that frequently our customers shutdown the system with the power
> switch, so our ability to control the shutdown is limited.

OK, I was wrong. Lack of logs is your second biggest problem. People
randomly pulling power is probably your biggest problem.

Tell me - have you tested that your harware really flushes data to disk
when it says it does?

> We would like any information or suggestions on:
> 1) What's happening.

Difficult to say - you'll have to turn on logging.

> 2) How can stop it from happening.

Step 1 - find out what is happening.

> 3) How can we detect when we are in such a state (so we can rebuild the database)

What is the reason why you can't have logging turned on?

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2006-02-01 09:32:32 Re: triggers, rules and alter table
Previous Message Tino Wildenhain 2006-02-01 08:18:18 Re: triggers and SELECT