Startup process thrashing

From: Phillip Berry <pberry(at)stellaconcepts(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Startup process thrashing
Date: 2008-12-11 02:32:33
Message-ID: 200812111332.33424.pberry@stellaconcepts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello Everyone,

I've got a bit of a problem. It started last night when postgres (8.1.9) went down citing the need
for a vacuum full to be done due to the transaction log needing to wraparound.

So I stopped the server, logged in using a standalone backend and started a vacuum full analyze on
all dbs (one being 156GB).

During the vacuum of the larger of the databases a few hours in it failed, it's filled up the 18GB
pg_xlog partition with over 1000 wal files. Due to running out of space the vacuum failed.

When I came in this morning I attempted to start postgres using the normal init script, and now it's
stuck. The startup process is thrashing the disks and working hard, pg_controldata says it's in
recovery, but it's been going for over two hours.

My question is where I should go from here? Should i kill the startup script, clear out the excess
wal files, start the standalone server and try vacuum again? Or should I just wait and see if the
startup process sorts itself out?

The startup process is responding to login attempts with FATAL: the database system is starting up
and logging these attempts so I assume it's still alive and working...

I appreciate any help and advice, I really hope it's not going to turn into lost data (gulp).

Output from pg_controldata:

pg_control version number: 812
Catalog version number: 200510211
Database system identifier: 5142157718116482999
Database cluster state: in recovery
pg_control last modified: Wed 10 Dec 2008 05:55:52 PM CST
Current log file ID: 811
Next log file segment: 221
Latest checkpoint location: 327/8AE2BED0
Prior checkpoint location: 327/8AE2BE80
Latest checkpoint's REDO location: 327/8AE2BED0
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 2146484231
Latest checkpoint's NextOID: 123620
Latest checkpoint's NextMultiXactId: 806872
Latest checkpoint's NextMultiOffset: 1766404
Time of latest checkpoint: Wed 10 Dec 2008 06:01:01 AM CST
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Date/time type storage: floating-point numbers
Maximum length of locale name: 128
LC_COLLATE: en_US.UTF-8
LC_CTYPE: en_US.UTF-8

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tim Uckun 2008-12-11 02:40:47 Re: Data Replication
Previous Message Joshua D. Drake 2008-12-11 02:26:51 Re: Data Replication