pg_clog woes with 7.3.2 - Episode 2

From: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_clog woes with 7.3.2 - Episode 2
Date: 2003-04-16 14:20:19
Message-ID: 03AF4E498C591348A42FC93DEA9661B83AF043@mail.vale-housing.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

A week or 2 back I reported a problem with 7.3.2 failing to open pg_clog
files ([HACKERS] pg_clog woes with 7.3.2).

Tom Lane was kind enough to login and do some debugging on a couple of
occasions and found corrupt pages in a database on the system. After the
first session, he suggested running memtest86 which showed no errors
after multiple passes and badblocks which showed no errors after
multiple non-destructive read-write tests.

One initdb and reload later (it's a new system, the old is still running
OK), and the error comes back again, only this time Tom finds the
corruption is in a couple of pages. Memtest86 again shows no errors, but
eventually badblocks did, but only when I used a destructive read write
test.

So, the disk goes back to Seagate, and is replaced with another
identical one, and a similar problem reoccurs (logs below) :-(. I
haven't run badblocks yet as it takes a fair while, but wanted to find
out if anyone thought this could be an OS issue or something else.
Previously I've been using the 2.4.19 Linux kernel, however this machine
is 2.4.20 (Slackware Linux 9). The SCSI adaptor is an Adaptec 29160, and
the disks are 34Gb Seagate Cheetah X15's.

Any thoughts or suggestions would be appreciated.

Regards, Dave.

LOG: connection received: host=[local]
LOG: connection authorized: user=postgres database=mnogo_int
LOG: query: begin; select getdatabaseencoding(); commit
LOG: query: vacuum;
PANIC: read of clog file 0, offset 253952 failed: Success
LOG: statement: vacuum;
LOG: server process (pid 2006) was terminated by signal 6
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing shared memory and
semaphores
LOG: database system was interrupted at 2003-04-16 15:06:34 BST
LOG: checkpoint record is at 0/2FB37D94
LOG: redo record is at 0/2FB37D94; undo record is at 0/0; shutdown TRUE
LOG: next transaction id: 3186; next oid: 9724496
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 0/2FB37DD4
LOG: ReadRecord: record with zero length at 0/2FC39CAC
LOG: redo done at 0/2FC39C88
LOG: database system is ready

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-04-16 14:23:09 Re: First draft of new FE/BE protocol spec posted for comments
Previous Message Hannu Krosing 2003-04-16 14:17:47 Re: Are we losing momentum?