Identifying cause of "database system shutdown was interrupted" at failed startup

From: "Crispin Miller" <CMiller(at)PICR(dot)man(dot)ac(dot)uk>
To: <pgsql-bugs(at)postgresql(dot)org>
Subject: Identifying cause of "database system shutdown was interrupted" at failed startup
Date: 2004-06-09 16:36:35
Message-ID: BAA35444B19AD940997ED02A6996AAE001AF8821@sanmail.picr.man.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,
We recently encountered a serious database crash that resulted
in a significant loss of data...

We took down the database server, and when we restarted the
backend we got an error 'database system shutdown was interrupted' ...
'invalid checkpoint' etc... with missing xlog files (I've appended the
log to the end of this post)...

I've been trawling list-archives for a few days and this issue
has cropped up a number of times, but I've found it hard to identify a
single post - or set of posts - that might help explain the cause of
such a crash...

Hopefully I'll be able to bring together the results of this
trawl through the archives in this post - but I'd really appreciate any
help or suggestions people have - we currently have a slightly uneasy
feeling because we've not quite got to the bottom of the issues, and it
would be nice to set our minds at rest! :-)

So far I've identified two possible causes of the crash - I've
listed them below, and wonder whether people have any comments on them:

1) We were running postgres version 7.3.6-1 (which is the
version in RedHat AS3 : redhat EL AS3 kernel-smp-2.4.21-9.0.1EL)
The following post suggests that this is a known issue in 7.3.3,
but 7.3.4 is safe? I assume, therefore, that 7.3.6-1 is also safe...

http://archives.postgresql.org/pgsql-general/2003-09/msg01086.php

2) We are running the database in conjunction with Jboss,
connecting to the database server from a different machine via JDBC. The
database was taken down *without* stopping Jboss first.

Any thoughts would be much apreciated!

Below are the relevant bits of the shutdown and startup logs,

Best wishes,
Crispin

----------------------
shutdown log (/var/log/messages):
May 28 15:43:35 shutdown: shutting down for system halt
May 28 15:43:35 init: Switching to runlevel: 0
May 28 15:43:36 server rhnsd[1694]: Exiting
May 28 15:43:36 server rhnsd: rhnsd shutdown succeeded
May 28 15:43:36 server atd: atd shutdown succeeded
May 28 15:43:36 server cups: cupsd shutdown succeeded
May 28 15:43:36 server xfs[1643]: terminating
May 28 15:43:36 server xfs: xfs shutdown succeeded
May 28 15:43:36 server mysqld: Stopping MySQL: succeeded
May 28 15:43:36 server gpm: gpm shutdown succeeded
May 28 15:43:37 server rhdb: Stopping PostgreSQL - Red Hat
Edition service:
May 28 15:43:37 server su(pam_unix)[12400]: session opened for
user postgres by (uid=0)
May 28 15:43:40 server su(pam_unix)[12400]: session closed for
user postgres
May 28 15:43:40 server rhdb: ^[[60G[
May 28 15:43:40 server rhdb:
May 28 15:43:40 server rc: Stopping rhdb: succeeded
...
May 28 15:43:44 server kernel: Kernel logging (proc) stopped.
May 28 15:43:44 server kernel: Kernel log daemon terminating.
May 28 15:43:45 server syslog: klogd shutdown succeeded
May 28 15:43:45 server exiting on signal 15
May 28 16:13:35 server syslogd 1.4.1: restart.

-----
starting messages

Jun 1 10:43:55 server postgres[5537]: [30] LOG: database
system shutdown was interrupted at 2004-05-28 16:32:08 BST
Jun 1 10:43:55 server postgres[5537]: [31] LOG: open of
/var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0)
failed: No such file or directory
Jun 1 10:43:55 server postgres[5537]: [32] LOG: invalid
primary checkpoint record
Jun 1 10:43:55 server postgres[5537]: [33] LOG: open of
/var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0)
failed: No such file or directory
Jun 1 10:43:55 server postgres[5537]: [34] LOG: invalid
secondary checkpoint record
Jun 1 10:43:55 server postgres[5537]: [35] PANIC: unable to
locate a valid checkpoint record
Jun 1 10:43:55 server postgres[5534]: [31] LOG: startup
process (pid 5537) was terminated by signal 6
Jun 1 10:43:55 server postgres[5534]: [32] LOG: aborting
startup due to startup process failure
Jun 1 10:43:56 server rhdb: Starting PostgreSQL - Red Hat
Edition service: failed
Jun 1 10:44:00 server su(pam_unix)[5554]: session opened for
user postgres by (uid=0)
Jun 1 10:44:00 server su(pam_unix)[5554]: session closed for
user postgres
Jun 1 10:44:00 server postgres[5595]: [30] LOG: database
system shutdown was interrupted at 2004-05-28 16:32:08 BST
Jun 1 10:44:00 server postgres[5595]: [31] LOG: open of
/var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0)
failed: No such file or directory
Jun 1 10:44:00 server postgres[5595]: [32] LOG: invalid
primary checkpoint record
Jun 1 10:44:00 server postgres[5595]: [33] LOG: open of
/var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 0)
failed: No such file or directory
Jun 1 10:44:00 server postgres[5595]: [34] LOG: invalid
secondary checkpoint record
Jun 1 10:44:00 server postgres[5595]: [35] PANIC: unable to
locate a valid checkpoint record
Jun 1 10:44:00 server postgres[5592]: [31] LOG: startup
process (pid 5595) was terminated by signal 6
Jun 1 10:44:00 server postgres[5592]: [32] LOG: aborting
startup due to startup process failure
Jun 1 10:44:01 server rhdb: Starting PostgreSQL - Red Hat
Edition service: failed

--------------------------------------------------------


This email is confidential and intended solely for the use of the person(s) ('the intended recipient') to whom it was addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Paterson Institute for Cancer Research or the Christie Hospital NHS Trust. It may contain information that is privileged & confidential within the meaning of applicable law. Accordingly any dissemination, distribution, copying, or other use of this message, or any of its contents, by any person other than the intended recipient may constitute a breach of civil or criminal law and is strictly prohibited. If you are NOT the intended recipient please contact the sender and dispose of this e-mail as soon as possible.

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2004-06-09 17:40:13 Re: BUG #1145: silent REVOKE failures
Previous Message Fabien COELHO 2004-06-09 15:40:34 Re: BUG #1145: silent REVOKE failures