Re: BUG #1208: Invalid page header

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Robert E Bruccoleri <bruc(at)stone(dot)congenomics(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1208: Invalid page header
Date: 2004-08-16 15:56:30
Message-ID: 200408161556.i7GFuUd18101@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


If you are sure your storage and memory are good, I can think of only
two other ideas. One is a gcc bug. You are running Itanium so it is
possible. The only other possibility I can think of is that that our
ia64 assembler code is wrong. It is:

static __inline__ int
tas(volatile slock_t *lock)
{
long int ret;

__asm__ __volatile__(
" xchg4 %0=%1,%2 \n"
: "=r"(ret), "+m"(*lock)
: "r"(1)
: "memory");
return (int) ret;
}

It is possible we don't have this working properly on ia64 SMP machines.

Again, these are only guesses but this is all I can think of. We have
no other reports of such failures _except_ for hardware problems.

You can try 8.0 beta1 and see if that helps. I do see the assembly code
is sligtly modified from the 7.4.X release. It might be significant,
but I doubt it.

---------------------------------------------------------------------------

PostgreSQL Bugs List wrote:
>
> The following bug has been logged online:
>
> Bug reference: 1208
> Logged by: Robert E Bruccoleri
>
> Email address: bruc(at)stone(dot)congenomics(dot)com
>
> PostgreSQL version: 7.4
>
> Operating system: Linux Advanced Server 2.1 and SGI ProPack 2.4
>
> Description: Invalid page header
>
> Details:
>
> ============================================================================
> POSTGRESQL BUG REPORT TEMPLATE
> ============================================================================
>
>
> Your name : Robert Bruccoleri
> Your email address : bruc(at)acm(dot)org
>
>
> System Configuration
> ---------------------
> Architecture (example: Intel Pentium) : Intel Itanium 2
>
> Operating System (example: Linux 2.4.18) : Linux 2.4.21 (SGI
> Propack 2.4 patch 10074)
>
> PostgreSQL version (example: PostgreSQL-7.4.3): PostgreSQL-7.4.3
>
> Compiler used (example: gcc 2.95.2) : Intel C compiler version
> 8.0
>
>
> Please enter a FULL description of your problem:
> ------------------------------------------------
>
> I am getting sporadic invalid page header errors when loading or
> vacuuming databases in parallel. We are in the process of migrating
> from an SGI Origin 3000 running PostgreSQL 7.4 to an SGI Altix running
> PostgreSQL 7.4.3. The Altix system has 64 processors with 256
> gigabytes of RAM. PostgreSQL was built using a 32K blocksize, and we
> start the system with a buffer cache of 130000 pages. Fdatasync is
> used for synchronization. We use an LSI Logic storage system to store
> the PostgreSQL databases as well as for much of our departments data,
> and we have about 5 terabytes used actively. The filesystem is XFS as
> delivered by SGI, which wrote it.
>
> I do not believe that we have any problems with unreliable disk
> storage. First, no other users have complained about problems and we
> have a lot more in use than what PostgreSQL is using. Second, the
> storage system is an enterprise class Fibre Channel dual controller
> RAID system designed for high redundancy and reliability. It has no
> single points of failure. We've been using it for over a year with no
> problems.
>
> We have about 14 active databases, and I loaded all 14 simultaneously. No
> errors were noted during the load, but upon vacuuming all the databases,
> one of the databases encountered the following message:
>
> INFO: vacuuming "public.relationships"
> vacuumdb: vacuuming of database "human_genome_042003" failed: ERROR:
> invalid page header in block 4763 of relation "relationships"
>
> There may be others with problems, but vacuumdb quit after this error.
>
> I downloaded pg_filedump and I ran it on the file containing this
> relation specifying a range covering a block around the erroneous
> block. The two blocks around the bad block have data as I would have
> expected for the "relationships" table, but the bad block has data from
> a table in another database.
>
> Here is part of the pg_filedump output:
>
> *******************************************************************
> * PostgreSQL File/Block Formatted Dump Utility - Version 3.0
> *
> * File: 367457
> * Options used: -f -R 4763 4763
> *
> * Dump created on: Wed Aug 4 19:47:46 2004
> *******************************************************************
>
> Block 4763 ********************************************************
> <Header> -----
> Block Offset: 0x094d8000 Offsets: Lower 0 (0x0000)
> Block: Size 0 Version 0 Upper 61440 (0xf000)
> LSN: logid 118874 recoff 0x0000000d Special 25476 (0x6384)
> Items: 0 Free Space: 61440
> Length (including item array): 24
>
> Error: Invalid header information.
>
> 0000: 5ad00100 0d000000 22000000 000000f0 Z.......".......
> 0010: 84630000 00000000 .c......
>
> <Data> ------
> Empty block - no items listed
>
> <Special Section> -----
> Error: Invalid special section encountered.
> 6384: 32343433 38320000 a9270000 ab270000 244382...'...'..
> 6394: 00000000 01000000 00000000 1edbab73 ...............s
> 63a4: 0e8f3ba6 22000000 40e3ffef 22000000 ..;."(dot)(dot)(dot)(at)(dot)(dot)(dot)"...
> 63b4: 68e2ffef 020a0000 b400000a fdb70500 h...............
> 63c4: bbc30500 08008f6e ae001200 02081800 .......n........
> 63d4: 0e000000 52313031 5f343438 38340000 ....R101_44884..
> 63e4: 15000000 15000000 4e545f30 31303839 ........NT_01089
> 63f4: 335f6735 352e7365 63000000 91000000 3_g55.sec.......
> 6404: 0f000000 70646231 63686b2e 412e2d00 ....pdb1chk.A.-.
> 6414: ee000000 00000000 48e17a14 ae470340 ........H.z..G.@
> 6424: 295c8fc2 f5280640 c3f5285c 8fc20b40 )\...((dot)(at)(dot)(dot)(\...@
> 6434: 3d0ad7a3 703d1340 0d000000 7f000000 =(dot)(dot)(dot)p=(dot)(at)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)
> 6444: 06819543 8b6c0640 d7a3703d 0a571040 (dot)(dot)(dot)C(dot)l(dot)(at)(dot)(dot)p=(dot)W(dot)@
> 6454: 91b8c7d2 87e62640 00000000 0078ca40 ......&@.....x.@
> 6464: 00000000 00000000 00000000 002062c0 ............. b.
> 6474: 00000000 e5fd877a 720918a8 22000000 .......zr..."...
> 6484: a06300f0 22000000 a06300f0 020a0000 .c.."....c......
> 6494: b400800a fdb70500 bbc30500 0800906e ...............n
> 64a4: 01001200 02081800 0e000000 52313031 ............R101
> 64b4: 5f343438 38340000 15000000 15000000 _44884..........
> 64c4: 4e545f30 31303839 335f6735 352e7365 NT_010893_g55.se
> 64d4: 63000000 91000000 0f000000 70646231 c...........pdb1
> 64e4: 63686d2e 422e2d00 91010000 00000000 chm.B.-.........
> 64f4: ec51b81e 85eb0940 3d0ad7a3 703d0a40 (dot)Q(dot)(dot)(dot)(dot)(dot)(at)=(dot)(dot)(dot)p=(dot)@
> 6504: 52b81e85 eb511140 b81e85eb 51b81a40 R(dot)(dot)(dot)(dot)Q(dot)(at)(dot)(dot)(dot)(dot)Q(dot)(dot)@
> 6514: 13000000 6c000000 e7fba9f1 d24d0d40 ....l........M.@
> 6524: 52b81e85 ebd11740 7940d994 2bd03540 R(dot)(dot)(dot)(dot)(dot)(dot)(at)y@..+.5@
> 6534: 00000000 0043bd40 00000000 00000000 (dot)(dot)(dot)(dot)(dot)C(dot)(at)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)
> 6544: 00000000 00c068c0 00000000 f7d17b03 ......h.......{.
> 6554: 08edd30d 22000000 786400f0 22000000 ...."...xd.."...
> 6564: 786400f0 020a0000 b400000a fdb70500 xd..............
> 6574: bbc30500 0800906e 02001200 02081800 .......n........
> 6584: 0e000000 52313031 5f343438 38340000 ....R101_44884..
> 6594: 15000000 15000000 4e545f30 31303839 ........NT_01089
> 65a4: 335f6735 352e7365 63000000 91000000 3_g55.sec.......
> 65b4: 0f000000 70646231 6369342e 412e2d00 ....pdb1ci4.A.-.
> 65c4: 59000000 00000000 3d0ad7a3 703df63f Y.......=...p=.?
> 65d4: 1f85eb51 b81e0f40 c3f5285c 8fc20b40 (dot)(dot)(dot)Q(dot)(dot)(dot)(at)(dot)(dot)(\...@
> 65e4: 33333333 33331840 12000000 54000000 333333(dot)(at)(dot)(dot)(dot)(dot)T(dot)(dot)(dot)
> 65f4: 06819543 8b6c0640 cdcccccc cc4c1540 (dot)(dot)(dot)C(dot)l(dot)(at)(dot)(dot)(dot)(dot)(dot)L(dot)@
> 6604: c3d32b65 19da2d40 00000000 0033be40 (dot)(dot)+e(dot)(dot)-(at)(dot)(dot)(dot)(dot)(dot)3(dot)@
> 6614: 00000000 00000000 00000000 00406e40 (dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(at)n@
> 6624: 00000000 c61e23a3 820d4664 22000000 ......#...Fd"...
> 6634: 506500f0 22000000 506500f0 020a0000 Pe.."...Pe......
> 6644: b400000a fdb70500 bbc30500 0800906e ...............n
> 6654: 03001200 02081800 0e000000 52313031 ............R101
> 6664: 5f343438 38340000 15000000 15000000 _44884..........
> 6674: 4e545f30 31303839 335f6735 352e7365 NT_010893_g55.se
> 6684: 63000000 91000000 0f000000 70646231 c...........pdb1
> 6694: 6369642e 2d2e2d00 b1000000 00000000 cid.-.-.........
>
> <truncated>
>
> In block 4763, there is data from another database named proceryon in
> the 14 that I loaded simultaneously. If this were an disk I/O error,
> then I would not have expected to see tuples from another
> database. I'd expect gibberish or nulls.
>
> I ran a vacuumdb on the table in proceryon that had data above, and
> there is no error. However, other tables in the proceryon database
> have invalid page headers. Here is another example:
>
> > pg_filedump -d -R 18311 18311 379598.3
>
> *******************************************************************
> * PostgreSQL File/Block Formatted Dump Utility - Version 3.0
> *
> * File: 379598.3
> * Options used: -d -R 18311 18311
> *
> * Dump created on: Thu Aug 5 16:18:39 2004
> *******************************************************************
>
> Block 18311 ********************************************************
> 0000: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll
> 0010: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll
> 0020: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll
> 0030: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll
> 0040: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll
> 0050: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll
> 0060: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll
> 0070: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll
>
> <truncated -- all the same>
>
> *** End of Requested Range Encountered. Last Block Read: 18311 ***
>
>
> Please describe a way to repeat the problem. Please try to provide a
> concise reproducible example, if at all possible:
> ----------------------------------------------------------------------
>
> I have been trying to use the test case of Hubert Froehlich,
> http://archives.postgresql.org/pgsql-general/2004-07/msg00670.php,
> but they do not generate any errors on our system. Only these big
> loads cause it.
>
> If you know how this problem might be fixed, list the solution below:
> ---------------------------------------------------------------------
>
> I am willing to be the hands of any PostgreSQL developer to explore
> this problem. The system is not in production, so I can make changes
> at will.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2004-08-16 15:58:49 Re: [BUGS] 8.0.0beta1: -lpthread missing
Previous Message Martin Münstermann 2004-08-16 15:49:31 Re: [BUGS] 8.0.0beta1: -lpthread missing