Re: production server down

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, Michael Fuhr <mike(at)fuhr(dot)org>, "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: production server down
Date: 2004-12-19 00:12:31
Message-ID: 7244.1103415151@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> The Tue Nov 2 17:05:32 2004 seems to be related to the *previous*
> restart; from /var/log/messages:

> Nov 2 17:04:20 csdfds1 syslogd 1.4.1: restart.
> ...
> Nov 2 17:05:22 csdfds1 su: pam_unix2: session started for user
> postgres, service su

> ...
> Nov 2 17:05:33 csdfds1 su: (to postgres) root on /dev/pts/5
> Nov 2 17:05:33 csdfds1 su: pam_unix2: session started for user
> postgres, service su
> Nov 2 17:05:33 csdfds1 su: pam_unix2: session finished for user
> postgres, service su

I'm betting that the "su" at :33 is the invocation of the postmaster.
The fact that it took the script 11 seconds to get to that step is
suggestive to say the least. Are you using one of the scripts that
does an auto initdb if it doesn't see a valid PGDATA? 11 seconds might
be about right for that.

One problem with this theory is how come you didn't get screwed during
*that* boot cycle. It seems to require assuming that the NFS mount came
online just after the initdb finished (else initdb would have
overwritten the on-NFS pg_control) but before the regular postmaster
started (else this same scenario would have played out then). That's
not a very wide window.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-12-19 00:31:21 pg_resetxlog for 8.0 (was Re: production server down)
Previous Message Tom Lane 2004-12-19 00:01:22 Re: production server down