Re: production server down

From: Joe Conway <mail(at)joeconway(dot)com>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Michael Fuhr <mike(at)fuhr(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: production server down
Date: 2004-12-18 23:53:06
Message-ID: 41C4C2E2.7090401@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:
> I can't help remembering the fact that the init script executes an
> initdb automatically if it finds an empty data directory (the ones I
> know of at least -- does the one you are running?). Maybe what happened
> was that it found the empty mount point, executed an initdb, and then
> the NFS drive came online. Later, the pg_control file was sync'ed to
> the "empty database" settings. It'd be interesting to know if the
> mount point does have some files on it.

Good point! I'll take a look at the first opportunity.

> These values (from the corrupt pg_control file) are strange:
>
>>pg_control last modified: Tue Dec 14 15:39:26 2004
>>Time of latest checkpoint: Tue Nov 2 17:05:32 2004
>
> Maybe the latest checkpoint date has some interesting bit pattern that
> could explain it somehow.
>

The last modified corresponds to just prior to the PANIC. See the logs:

2004-12-14 15:39:26 LOG: received smart shutdown request
2004-12-14 15:39:26 LOG: shutting down
2004-12-14 15:39:28 PANIC: could not open file
"/replica/pgdata/pg_xlog/0000000000000000" (log file 0, segment 0): No
such file or directory

The Tue Nov 2 17:05:32 2004 seems to be related to the *previous*
restart; from /var/log/messages:

8<----------------------------------
...
Nov 2 17:04:20 csdfds1 syslogd 1.4.1: restart.
...
Nov 2 17:05:22 csdfds1 su: pam_unix2: session started for user
postgres, service su

...
Nov 2 17:05:33 csdfds1 su: (to postgres) root on /dev/pts/5
Nov 2 17:05:33 csdfds1 su: pam_unix2: session started for user
postgres, service su
Nov 2 17:05:33 csdfds1 su: pam_unix2: session finished for user
postgres, service su
...
8<----------------------------------

Can you make any sense out of that?

Joe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-12-19 00:01:22 Re: production server down
Previous Message Alvaro Herrera 2004-12-18 23:27:30 Re: production server down