Hosed PostGreSQL Installation

From: "Pete St(dot) Onge" <pete(at)seul(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Hosed PostGreSQL Installation
Date: 2002-09-21 05:54:55
Message-ID: 20020921015454.U31893@moria.seul.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

As a result of some disk errors on another drive, an admin in our group
brought down the server hosting our pgsql databases with a kill -KILL
after having gone to runlevel 1 and finding the postmaster process still
running. No surprise, our installation was hosed in the process.

After talking on #postgresql with klamath for about an hour or so to
work through the issue (many thanks!), it was suggested that I send
the info to this list.

Currently, PostGreSQL will no longer start, and gives this error.

bash-2.05$ /usr/bin/pg_ctl -D $PGDATA -p /usr/bin/postmaster start
postmaster successfully started
bash-2.05$ DEBUG: database system shutdown was interrupted at
2002-09-19 22:59:54 EDT
DEBUG: open(logfile 0 seg 0) failed: No such file or directory
DEBUG: Invalid primary checkPoint record
DEBUG: open(logfile 0 seg 0) failed: No such file or directory
DEBUG: Invalid secondary checkPoint record
FATAL 2: Unable to locate a valid CheckPoint record
/usr/bin/postmaster: Startup proc 11735 exited with status 512 - abort

Our setup is vanilla Red Hat 7.2, having pretty much all of the
postgresql-*-7.1.3-2 packages installed. Klamath asked if I had disabled
fsync in postgresql.conf, and the only non-default (read: non-commented)
setting in the file is: `tcpip_socket = true`

Klamath suggested that I run pg_controldata:

bash-2.05$ ./pg_controldata
pg_control version number: 71
Catalog version number: 200101061
Database state: SHUTDOWNING
pg_control last modified: Thu Sep 19 22:59:54 2002
Current log file id: 0
Next log file segment: 1
Latest checkpoint location: 0/1739A0
Prior checkpoint location: 0/1718F0
Latest checkpoint's REDO location: 0/1739A0
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's StartUpID: 21
Latest checkpoint's NextXID: 615
Latest checkpoint's NextOID: 18720
Time of latest checkpoint: Thu Sep 19 22:49:42 2002
Database block size: 8192
Blocks per segment of large relation: 131072
LC_COLLATE: en_US
LC_CTYPE: en_US

If I look into the pg_xlog directory, I see this:

sh-2.05$ cd pg_xlog/
bash-2.05$ ls -l
total 32808
-rw------- 1 postgres postgres 16777216 Sep 20 23:13 0000000000000002
-rw------- 1 postgres postgres 16777216 Sep 19 22:09 000000020000007E

There is one caveat. The installation resides on a partition of its own:
/dev/hda3 17259308 6531140 9851424 40% /var/lib/pgsql/data

fdisk did not report errors for this partition at boot time after the
forced shutdown, however.

This installation serves a university research project, and although
most of the code / schemas are in development (and should be in cvs by
rights), I can't confirm that all projects have indeed done that. So any
advice, ideas or suggestions on how the data and / or schemas can be
recovered would be greatly appreciated.

Many thanks!

-- pete

P.S.: I've been using pgsql for about four years now, and it played a
big role during my grad work. In fact, the availability of pgsql was one
of the reasons why I was able to complete and graduate. Many thanks for
such a great database!

--
Pete St. Onge
Research Associate, Computational Biologist, UNIX Admin
Banting and Best Institute of Medical Research
Program in Bioinformatics and Proteomics
University of Toronto
http://www.utoronto.ca/emililab/ pete(at)seul(dot)org

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shridhar Daithankar 2002-09-21 08:44:26 Re: Improving speed of copy
Previous Message Curt Sampson 2002-09-21 04:16:42 Re: PGXLOG variable worthwhile?