Re: corruption diag/recovery, pg_dump crash

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: corruption diag/recovery, pg_dump crash
Date: 2003-12-06 23:05:26
Message-ID: 20031206230526.GA14441@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

While I can't help you with most of your message, the pg_clog is an easier
one. Basically, creating a file with that name with 256KB of zero will let
postgres complete the dump.

*HOWEVER*, what this means is that one of the tuple headers in the database
refers to a nonexistant transaction. So that is definitly some kind of
corruption going on there.

Hope this helps,

On Sat, Dec 06, 2003 at 02:30:37PM -0700, Ed L. wrote:
> We are seeing what looks like pgsql data file corruption across multiple
> clusters on a RAID5 partition on a single redhat linux 2.4 server running
> 7.3.4. System has ~20 clusters installed with a mix of 7.2.3, 7.3.2, and
> 7.3.4 (mostly 7.3.4), 10gb ram, 76gb on a RAID5, dual cpus, and very busy
> with hundreds and sometimes > 1000 simultaneous connections. After ~250
> days of continuous, flawless uptime operations, we recently began seeing
> major performance degradation accompanied by messages like the following:
>
> ERROR: Invalid page header in block NN of some_relation (10-15 instances)
>
> ERROR: XLogFlush: request 38/5E659BA0 is not satisfied ... (1 instance
> repeated many times)
>
> I think I've been able to repair most of the "Invalid page header" errors by
> rebuilding indices or truncating/reloading tabledata. The XLogFlush error
> was occuring for a particular index, and a drop/reload has at least ceased
> that error. Now, a pg_dump error is occurring on one cluster preventing a
> successful dump. Of course, it's gone unnoticed long enough to rollover
> our good online backups and the bazillion-dollar offline/offsite backup
> system wasn't working properly. Here's the pg_dump output, edited to
> protect the guilty:
>
> pg_dump: PANIC: open of .../data/pg_clog/04E5 failed: No such file or
> directory
> pg_dump: lost synchronization with server, resetting connection
> pg_dump: WARNING: Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend
> died abnormally and possibly corrupted ... blah blah
> pg_dump: SQL command to dump the contents of table "sometable" failed:
> PQendcopy() failed.
> pg_dump: Error message from server: server closed the connection
> unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_dump: The command was: COPY public.sometable ("key", ...) TO stdout;
> pg_dumpall: pg_dump failed on somedb, exiting
>
> Why that 04E5 file is missing, I haven't a clue. I've attached an "ls -l"
> for the pg_clog dir.
>
> Past list discussions suggest this may be an elusive hardware issue. We did
> find a msg in /var/log/messages...
>
> kernel: ISR called reentrantly!!
>
> which some here have found newsgroup reports of connection to some sort of
> raid/bios issue. We've taken the machine offline and conducted extensive
> hardware diagnostics on RAID controller, filesystem (fsck), RAM, and found
> no further indication of hardware failure. The machine had run flawlessly
> for these ~20 clusters for ~250 days until cratering yesterday amidst these
> errors and absurd system (disk) IO sluggishness. Upon reboot and upgrades,
> the machine continues to exhibit infrequent corruption (or infrequently
> discovered). Based on hardware vendor (Dell) support folks, we've upgraded
> our kernel (now 2.4.20-24.7bigmem), several drivers, raid controller
> firmware, rebooted, etc. The disk IO sluggishness has largely diminished,
> but we're still seeing the Invalid page header pop-up anew, albeit
> infrequently. The XLogFlush error seems to have gone away with the
> reconstruction of an index.
>
> Current plan is to get as much data recovered as possible, and then do
> significant hardware replacements (along with more frequent planned reboots
> and more vigilant backups).
>
> Any clues/suggestions for recovering this data or fixing other issues would
> be greatly appreciated.
>
> TIA.

> total 64336
> -rw------- 1 pgdba pg 262144 Aug 12 18:39 0000
> -rw------- 1 pgdba pg 262144 Aug 14 11:56 0001
> -rw------- 1 pgdba pg 262144 Aug 14 20:22 0002
> -rw------- 1 pgdba pg 262144 Aug 15 16:01 0003
> -rw------- 1 pgdba pg 262144 Aug 15 23:08 0004
> -rw------- 1 pgdba pg 262144 Aug 16 05:33 0005
> -rw------- 1 pgdba pg 262144 Aug 16 11:42 0006
> -rw------- 1 pgdba pg 262144 Aug 16 18:25 0007
> -rw------- 1 pgdba pg 262144 Aug 16 23:57 0008
> -rw------- 1 pgdba pg 262144 Aug 17 08:16 0009
> -rw------- 1 pgdba pg 262144 Aug 17 14:31 000A
> -rw------- 1 pgdba pg 262144 Aug 17 20:24 000B
> -rw------- 1 pgdba pg 262144 Aug 17 23:57 000C
> -rw------- 1 pgdba pg 262144 Aug 18 03:33 000D
> -rw------- 1 pgdba pg 262144 Aug 18 13:01 000E
> -rw------- 1 pgdba pg 262144 Aug 19 13:03 000F
> -rw------- 1 pgdba pg 262144 Aug 19 18:54 0010
> -rw------- 1 pgdba pg 262144 Aug 19 23:19 0011
> -rw------- 1 pgdba pg 262144 Aug 20 04:29 0012
> -rw------- 1 pgdba pg 262144 Aug 20 12:50 0013
> -rw------- 1 pgdba pg 262144 Aug 20 15:00 0014
> -rw------- 1 pgdba pg 262144 Aug 20 23:29 0015
> -rw------- 1 pgdba pg 262144 Aug 21 11:50 0016
> -rw------- 1 pgdba pg 262144 Aug 21 16:36 0017
> -rw------- 1 pgdba pg 262144 Aug 21 21:36 0018
> -rw------- 1 pgdba pg 262144 Aug 22 03:24 0019
> -rw------- 1 pgdba pg 262144 Aug 22 09:16 001A
> -rw------- 1 pgdba pg 262144 Aug 22 15:59 001B
> -rw------- 1 pgdba pg 262144 Aug 23 06:39 001C
> -rw------- 1 pgdba pg 262144 Aug 24 01:10 001D
> -rw------- 1 pgdba pg 262144 Aug 24 15:53 001E
> -rw------- 1 pgdba pg 262144 Aug 25 09:54 001F
> -rw------- 1 pgdba pg 262144 Aug 25 14:37 0020
> -rw------- 1 pgdba pg 262144 Aug 26 01:29 0021
> -rw------- 1 pgdba pg 262144 Aug 26 13:13 0022
> -rw------- 1 pgdba pg 262144 Aug 26 18:26 0023
> -rw------- 1 pgdba pg 262144 Aug 27 10:14 0024
> -rw------- 1 pgdba pg 262144 Aug 27 17:10 0025
> -rw------- 1 pgdba pg 262144 Aug 28 08:31 0026
> -rw------- 1 pgdba pg 262144 Aug 28 15:21 0027
> -rw------- 1 pgdba pg 262144 Aug 29 06:11 0028
> -rw------- 1 pgdba pg 262144 Aug 29 13:56 0029
> -rw------- 1 pgdba pg 262144 Aug 30 03:51 002A
> -rw------- 1 pgdba pg 262144 Aug 30 17:15 002B
> -rw------- 1 pgdba pg 262144 Aug 31 11:31 002C
> -rw------- 1 pgdba pg 262144 Sep 1 04:59 002D
> -rw------- 1 pgdba pg 262144 Sep 1 17:01 002E
> -rw------- 1 pgdba pg 262144 Sep 2 09:52 002F
> -rw------- 1 pgdba pg 262144 Sep 2 16:24 0030
> -rw------- 1 pgdba pg 262144 Sep 3 07:07 0031
> -rw------- 1 pgdba pg 262144 Sep 3 13:27 0032
> -rw------- 1 pgdba pg 262144 Sep 4 04:25 0033
> -rw------- 1 pgdba pg 262144 Sep 4 13:11 0034
> -rw------- 1 pgdba pg 262144 Sep 5 02:11 0035
> -rw------- 1 pgdba pg 262144 Sep 5 12:31 0036
> -rw------- 1 pgdba pg 262144 Sep 6 01:18 0037
> -rw------- 1 pgdba pg 262144 Sep 6 17:12 0038
> -rw------- 1 pgdba pg 262144 Sep 7 12:01 0039
> -rw------- 1 pgdba pg 262144 Sep 8 08:00 003A
> -rw------- 1 pgdba pg 262144 Sep 8 14:32 003B
> -rw------- 1 pgdba pg 262144 Sep 9 06:14 003C
> -rw------- 1 pgdba pg 262144 Sep 9 13:12 003D
> -rw------- 1 pgdba pg 262144 Sep 9 20:56 003E
> -rw------- 1 pgdba pg 262144 Sep 10 09:26 003F
> -rw------- 1 pgdba pg 262144 Sep 10 14:27 0040
> -rw------- 1 pgdba pg 262144 Sep 10 20:29 0041
> -rw------- 1 pgdba pg 262144 Sep 11 03:29 0042
> -rw------- 1 pgdba pg 262144 Sep 11 12:00 0043
> -rw------- 1 pgdba pg 262144 Sep 11 20:27 0044
> -rw------- 1 pgdba pg 262144 Sep 12 09:01 0045
> -rw------- 1 pgdba pg 262144 Sep 12 15:37 0046
> -rw------- 1 pgdba pg 262144 Sep 13 07:29 0047
> -rw------- 1 pgdba pg 262144 Sep 13 18:59 0048
> -rw------- 1 pgdba pg 262144 Sep 14 12:05 0049
> -rw------- 1 pgdba pg 262144 Sep 15 07:17 004A
> -rw------- 1 pgdba pg 262144 Sep 15 13:53 004B
> -rw------- 1 pgdba pg 262144 Sep 16 01:09 004C
> -rw------- 1 pgdba pg 262144 Sep 16 11:18 004D
> -rw------- 1 pgdba pg 262144 Sep 16 18:46 004E
> -rw------- 1 pgdba pg 262144 Sep 17 09:17 004F
> -rw------- 1 pgdba pg 262144 Sep 17 16:45 0050
> -rw------- 1 pgdba pg 262144 Sep 18 07:39 0051
> -rw------- 1 pgdba pg 262144 Sep 18 14:20 0052
> -rw------- 1 pgdba pg 262144 Sep 19 01:38 0053
> -rw------- 1 pgdba pg 262144 Sep 19 12:05 0054
> -rw------- 1 pgdba pg 262144 Sep 19 22:39 0055
> -rw------- 1 pgdba pg 262144 Sep 20 13:55 0056
> -rw------- 1 pgdba pg 262144 Sep 21 09:02 0057
> -rw------- 1 pgdba pg 262144 Sep 22 02:47 0058
> -rw------- 1 pgdba pg 262144 Sep 22 12:42 0059
> -rw------- 1 pgdba pg 262144 Sep 22 21:57 005A
> -rw------- 1 pgdba pg 262144 Sep 23 10:28 005B
> -rw------- 1 pgdba pg 262144 Sep 23 18:00 005C
> -rw------- 1 pgdba pg 262144 Sep 24 08:52 005D
> -rw------- 1 pgdba pg 262144 Sep 24 15:14 005E
> -rw------- 1 pgdba pg 262144 Sep 25 04:16 005F
> -rw------- 1 pgdba pg 262144 Sep 25 12:17 0060
> -rw------- 1 pgdba pg 262144 Sep 25 20:17 0061
> -rw------- 1 pgdba pg 262144 Sep 26 10:07 0062
> -rw------- 1 pgdba pg 262144 Sep 26 16:24 0063
> -rw------- 1 pgdba pg 262144 Sep 27 09:20 0064
> -rw------- 1 pgdba pg 262144 Sep 28 00:27 0065
> -rw------- 1 pgdba pg 262144 Sep 28 16:17 0066
> -rw------- 1 pgdba pg 262144 Sep 29 09:45 0067
> -rw------- 1 pgdba pg 262144 Sep 29 16:37 0068
> -rw------- 1 pgdba pg 262144 Sep 30 07:44 0069
> -rw------- 1 pgdba pg 262144 Sep 30 15:03 006A
> -rw------- 1 pgdba pg 262144 Oct 1 05:59 006B
> -rw------- 1 pgdba pg 262144 Oct 1 12:52 006C
> -rw------- 1 pgdba pg 262144 Oct 1 22:19 006D
> -rw------- 1 pgdba pg 262144 Oct 2 10:53 006E
> -rw------- 1 pgdba pg 262144 Oct 2 19:28 006F
> -rw------- 1 pgdba pg 262144 Oct 3 10:18 0070
> -rw------- 1 pgdba pg 262144 Oct 3 19:11 0071
> -rw------- 1 pgdba pg 262144 Oct 4 12:42 0072
> -rw------- 1 pgdba pg 262144 Oct 5 08:24 0073
> -rw------- 1 pgdba pg 262144 Oct 6 00:03 0074
> -rw------- 1 pgdba pg 262144 Oct 6 11:57 0075
> -rw------- 1 pgdba pg 262144 Oct 6 19:46 0076
> -rw------- 1 pgdba pg 262144 Oct 7 09:43 0077
> -rw------- 1 pgdba pg 262144 Oct 7 17:09 0078
> -rw------- 1 pgdba pg 262144 Oct 8 07:33 0079
> -rw------- 1 pgdba pg 262144 Oct 8 13:34 007A
> -rw------- 1 pgdba pg 262144 Oct 8 18:41 007B
> -rw------- 1 pgdba pg 262144 Oct 8 23:28 007C
> -rw------- 1 pgdba pg 262144 Oct 9 09:51 007D
> -rw------- 1 pgdba pg 262144 Oct 9 14:22 007E
> -rw------- 1 pgdba pg 262144 Oct 9 17:04 007F
> -rw------- 1 pgdba pg 262144 Oct 10 06:56 0080
> -rw------- 1 pgdba pg 262144 Oct 10 12:31 0081
> -rw------- 1 pgdba pg 262144 Oct 10 18:19 0082
> -rw------- 1 pgdba pg 262144 Oct 11 10:22 0083
> -rw------- 1 pgdba pg 262144 Oct 12 02:29 0084
> -rw------- 1 pgdba pg 262144 Oct 12 17:43 0085
> -rw------- 1 pgdba pg 262144 Oct 13 09:49 0086
> -rw------- 1 pgdba pg 262144 Oct 13 17:00 0087
> -rw------- 1 pgdba pg 262144 Oct 14 07:48 0088
> -rw------- 1 pgdba pg 262144 Oct 14 12:49 0089
> -rw------- 1 pgdba pg 262144 Oct 14 16:48 008A
> -rw------- 1 pgdba pg 262144 Oct 15 07:33 008B
> -rw------- 1 pgdba pg 262144 Oct 15 14:30 008C
> -rw------- 1 pgdba pg 262144 Oct 16 01:41 008D
> -rw------- 1 pgdba pg 262144 Oct 16 12:30 008E
> -rw------- 1 pgdba pg 262144 Oct 16 20:30 008F
> -rw------- 1 pgdba pg 262144 Oct 17 10:32 0090
> -rw------- 1 pgdba pg 262144 Oct 17 17:38 0091
> -rw------- 1 pgdba pg 262144 Oct 18 10:25 0092
> -rw------- 1 pgdba pg 262144 Oct 19 01:53 0093
> -rw------- 1 pgdba pg 262144 Oct 19 16:38 0094
> -rw------- 1 pgdba pg 262144 Oct 20 09:23 0095
> -rw------- 1 pgdba pg 262144 Oct 20 16:40 0096
> -rw------- 1 pgdba pg 262144 Oct 21 07:08 0097
> -rw------- 1 pgdba pg 262144 Oct 21 13:31 0098
> -rw------- 1 pgdba pg 262144 Oct 21 21:56 0099
> -rw------- 1 pgdba pg 262144 Oct 22 10:02 009A
> -rw------- 1 pgdba pg 262144 Oct 22 16:31 009B
> -rw------- 1 pgdba pg 262144 Oct 22 22:59 009C
> -rw------- 1 pgdba pg 262144 Oct 23 10:46 009D
> -rw------- 1 pgdba pg 262144 Oct 23 17:20 009E
> -rw------- 1 pgdba pg 262144 Oct 24 08:25 009F
> -rw------- 1 pgdba pg 262144 Oct 24 14:48 00A0
> -rw------- 1 pgdba pg 262144 Oct 25 05:45 00A1
> -rw------- 1 pgdba pg 262144 Oct 25 20:22 00A2
> -rw------- 1 pgdba pg 262144 Oct 26 13:16 00A3
> -rw------- 1 pgdba pg 262144 Oct 27 07:34 00A4
> -rw------- 1 pgdba pg 262144 Oct 27 13:54 00A5
> -rw------- 1 pgdba pg 262144 Oct 28 03:14 00A6
> -rw------- 1 pgdba pg 262144 Oct 28 11:58 00A7
> -rw------- 1 pgdba pg 262144 Oct 28 19:36 00A8
> -rw------- 1 pgdba pg 262144 Oct 29 09:39 00A9
> -rw------- 1 pgdba pg 262144 Oct 29 16:27 00AA
> -rw------- 1 pgdba pg 262144 Oct 30 07:23 00AB
> -rw------- 1 pgdba pg 262144 Oct 30 13:43 00AC
> -rw------- 1 pgdba pg 262144 Oct 31 02:31 00AD
> -rw------- 1 pgdba pg 262144 Oct 31 11:59 00AE
> -rw------- 1 pgdba pg 262144 Oct 31 19:54 00AF
> -rw------- 1 pgdba pg 262144 Nov 1 13:44 00B0
> -rw------- 1 pgdba pg 262144 Nov 2 08:26 00B1
> -rw------- 1 pgdba pg 262144 Nov 2 20:59 00B2
> -rw------- 1 pgdba pg 262144 Nov 3 10:33 00B3
> -rw------- 1 pgdba pg 262144 Nov 3 17:21 00B4
> -rw------- 1 pgdba pg 262144 Nov 4 09:01 00B5
> -rw------- 1 pgdba pg 262144 Nov 4 14:44 00B6
> -rw------- 1 pgdba pg 262144 Nov 5 06:33 00B7
> -rw------- 1 pgdba pg 262144 Nov 5 13:17 00B8
> -rw------- 1 pgdba pg 262144 Nov 5 20:45 00B9
> -rw------- 1 pgdba pg 262144 Nov 6 09:45 00BA
> -rw------- 1 pgdba pg 262144 Nov 6 17:04 00BB
> -rw------- 1 pgdba pg 262144 Nov 7 06:55 00BC
> -rw------- 1 pgdba pg 262144 Nov 7 13:31 00BD
> -rw------- 1 pgdba pg 262144 Nov 8 03:58 00BE
> -rw------- 1 pgdba pg 262144 Nov 8 17:04 00BF
> -rw------- 1 pgdba pg 262144 Nov 9 11:14 00C0
> -rw------- 1 pgdba pg 262144 Nov 10 06:16 00C1
> -rw------- 1 pgdba pg 262144 Nov 10 12:47 00C2
> -rw------- 1 pgdba pg 262144 Nov 10 21:18 00C3
> -rw------- 1 pgdba pg 262144 Nov 11 10:34 00C4
> -rw------- 1 pgdba pg 262144 Nov 11 17:23 00C5
> -rw------- 1 pgdba pg 262144 Nov 12 09:15 00C6
> -rw------- 1 pgdba pg 262144 Nov 12 15:03 00C7
> -rw------- 1 pgdba pg 262144 Nov 13 06:30 00C8
> -rw------- 1 pgdba pg 262144 Nov 13 13:56 00C9
> -rw------- 1 pgdba pg 262144 Nov 14 00:38 00CA
> -rw------- 1 pgdba pg 262144 Nov 14 13:06 00CB
> -rw------- 1 pgdba pg 262144 Nov 14 21:27 00CC
> -rw------- 1 pgdba pg 262144 Nov 15 13:25 00CD
> -rw------- 1 pgdba pg 262144 Nov 16 08:57 00CE
> -rw------- 1 pgdba pg 262144 Nov 16 23:22 00CF
> -rw------- 1 pgdba pg 262144 Nov 17 11:49 00D0
> -rw------- 1 pgdba pg 262144 Nov 17 20:12 00D1
> -rw------- 1 pgdba pg 262144 Nov 18 09:10 00D2
> -rw------- 1 pgdba pg 262144 Nov 18 16:02 00D3
> -rw------- 1 pgdba pg 262144 Nov 19 05:23 00D4
> -rw------- 1 pgdba pg 262144 Nov 19 12:27 00D5
> -rw------- 1 pgdba pg 262144 Nov 19 19:22 00D6
> -rw------- 1 pgdba pg 262144 Nov 20 10:36 00D7
> -rw------- 1 pgdba pg 262144 Nov 20 16:40 00D8
> -rw------- 1 pgdba pg 262144 Nov 21 08:19 00D9
> -rw------- 1 pgdba pg 262144 Nov 21 14:53 00DA
> -rw------- 1 pgdba pg 262144 Nov 22 05:41 00DB
> -rw------- 1 pgdba pg 262144 Nov 22 19:28 00DC
> -rw------- 1 pgdba pg 262144 Nov 23 12:30 00DD
> -rw------- 1 pgdba pg 262144 Nov 24 07:24 00DE
> -rw------- 1 pgdba pg 262144 Nov 24 14:18 00DF
> -rw------- 1 pgdba pg 262144 Nov 25 02:03 00E0
> -rw------- 1 pgdba pg 262144 Nov 25 11:47 00E1
> -rw------- 1 pgdba pg 262144 Nov 25 18:46 00E2
> -rw------- 1 pgdba pg 262144 Nov 26 09:57 00E3
> -rw------- 1 pgdba pg 262144 Nov 26 17:09 00E4
> -rw------- 1 pgdba pg 262144 Nov 27 11:48 00E5
> -rw------- 1 pgdba pg 262144 Nov 28 07:43 00E6
> -rw------- 1 pgdba pg 262144 Nov 28 16:12 00E7
> -rw------- 1 pgdba pg 262144 Nov 29 09:02 00E8
> -rw------- 1 pgdba pg 262144 Nov 30 01:06 00E9
> -rw------- 1 pgdba pg 262144 Nov 30 16:51 00EA
> -rw------- 1 pgdba pg 262144 Dec 1 09:23 00EB
> -rw------- 1 pgdba pg 262144 Dec 1 17:05 00EC
> -rw------- 1 pgdba pg 262144 Dec 2 07:24 00ED
> -rw------- 1 pgdba pg 262144 Dec 2 14:19 00EE
> -rw------- 1 pgdba pg 262144 Dec 3 03:52 00EF
> -rw------- 1 pgdba pg 262144 Dec 3 12:51 00F0
> -rw------- 1 pgdba pg 262144 Dec 3 22:34 00F1
> -rw------- 1 pgdba pg 262144 Dec 4 10:46 00F2
> -rw------- 1 pgdba pg 262144 Dec 4 17:20 00F3
> -rw------- 1 pgdba pg 262144 Dec 5 11:34 00F4
> -rw------- 1 pgdba pg 262144 Dec 6 00:23 00F5
> -rw------- 1 pgdba pg 262144 Dec 6 11:07 00F6
> -rw------- 1 pgdba pg 114688 Dec 6 16:10 00F7

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html

--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> "All that is needed for the forces of evil to triumph is for enough good
> men to do nothing." - Edmond Burke
> "The penalty good people pay for not being interested in politics is to be
> governed by people worse than themselves." - Plato

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2003-12-06 23:34:57 Re: CMS with PostgreSQL
Previous Message Ed L. 2003-12-06 21:45:40 Re: corruption diag/recovery, pg_dump crash