Re: Index corruption

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marc Munro <marc(at)bloodnok(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Index corruption
Date: 2006-06-29 23:27:07
Message-ID: 18191.1151623627@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> What I speculate right at the moment is that we are not looking at index
> corruption at all, but at heap corruption: somehow, the first insertion
> into ctid (27806,2) got lost and the same ctid got re-used for the next
> inserted row. We fixed one bug like this before ...

Further study makes this look more probable. It seems that most of the
activity around the trouble spot consists of transactions inserting
exactly 13 rows into the table; for instance here are traces of two
successive transactions:

Block Item XMIN CMIN

27800 7 600921860 3
27800 8 600921860 7
27800 9 600921860 11
27800 10 600921860 15
27800 11 600921860 19
27800 12 600921860 23
27800 13 600921860 27
27800 14 600921860 31
27800 15 600921860 35
27800 16 600921860 39
27800 17 600921860 43
27800 18 600921860 47
27800 19 600921860 51

27800 20 600921870 3
27800 21 600921870 7
27800 22 600921870 11
27800 23 600921870 15
27800 24 600921870 19
27800 25 600921870 23
27800 26 600921870 27
27800 27 600921870 31
27800 28 600921870 35
27800 29 600921870 39
27800 30 600921870 43
27800 31 600921870 47
27800 32 600921870 51

The pattern of CMIN values is the same for all these transactions.
But look at the rows inserted by 600921856 and 600921858, the two
transactions that seem to be involved with the problem on page
27086:

27787 50 600921856 3
27795 41 600921858 3
27795 42 600921858 7
27795 43 600921858 11
27795 44 600921858 15
27795 45 600921858 19
27795 46 600921858 23
27795 47 600921858 27
27795 48 600921858 31
27795 49 600921858 35
27795 50 600921858 39
27795 51 600921858 43
27806 1 600921858 47
27806 2 600921856 15
27806 3 600921856 19
27806 4 600921856 23
27806 5 600921856 27
27806 6 600921856 31
27806 7 600921856 35
27806 8 600921856 39
27806 9 600921856 43
27806 10 600921856 47
27806 11 600921856 51
27806 12 600921858 51

(27787,50) and (27795,51) are both the last rows on their pages. What's
evidently happened is that the two transactions filled those pages and
then both seized on 27806 as the next page to insert into. I think that
600921856 tried to insert its CMIN 7 and 11 rows as items 1 and 2 on
that page, and then something wiped the page, then 600921858 inserted
its CMIN 47 row as item 1, and then 600921856 got control back and
finished inserting its rows. Further study of the indexes shows that
there are two entries in each index pointing to each of (27806,1) and
(27806,2) --- but since the xxid values are different for the two
(27806,1) entries, those didn't show up as duplicates in my first look.

Given the apparent connection to vacuuming, this is looking a WHOLE
lot like this bug fixed in 8.0.3:

2005-05-07 17:32 tgl

* src/backend/: access/heap/hio.c, commands/vacuumlazy.c
(REL7_3_STABLE), access/heap/hio.c, access/nbtree/nbtpage.c,
access/nbtree/nbtree.c, commands/vacuumlazy.c (REL7_4_STABLE),
access/heap/hio.c, commands/vacuumlazy.c (REL7_2_STABLE),
access/heap/hio.c, access/nbtree/nbtpage.c, access/nbtree/nbtree.c,
commands/vacuumlazy.c (REL8_0_STABLE), access/heap/hio.c,
access/nbtree/nbtpage.c, access/nbtree/nbtree.c,
commands/vacuumlazy.c: Repair very-low-probability race condition
between relation extension and VACUUM: in the interval between
adding a new page to the relation and formatting it, it was
possible for VACUUM to come along and decide it should format the
page too. Though not harmful in itself, this would cause data loss
if a third transaction were able to insert tuples into the vacuumed
page before the original extender got control back.

Are you *certain* this slave isn't running 8.0.2 or older? If you can
verify that, then I guess we need to look for another mechanism that
could cause the same kind of thing.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-06-29 23:34:55 Re: [HACKERS] Non-transactional pg_class, try 2
Previous Message Marc Munro 2006-06-29 22:06:16 Re: Index corruption