Re: Violation of primary key constraint

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Toby Murray <toby(dot)murray(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Violation of primary key constraint
Date: 2013-02-01 21:58:27
Message-ID: 23797.1359755907@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Toby Murray <toby(dot)murray(at)gmail(dot)com> writes:
> On Thu, Jan 31, 2013 at 5:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Could we see the full results of heap_page_items(get_raw_page()) for
>> each of the pages where any of these tuples live, viz

> Seems awkward to put inline so I made a text file with these results
> and am attaching it. Hoping the mailing list allows attachments.

Thanks. These are a bit odd, because none of the tuples in question
look like they've been updated, except for (11249622,6) which we
already thought had been re-modified by some later transaction.

What it looks like to me is that the writes of the source pages simply
got lost somehow, so that what's on disk now is the pre-update copies
of these tuples as well as the post-update copies. I wonder whether
your SSD dropped a batch of updates. Speculating wildly: maybe that
was triggered by a power glitch that also caused the reported error
on your RAID array?

One thing that I learned from examining the PK index is that for
three out of these four pairs, there were intermediate versions,
ie there must have been multiple UPDATEs issued, probably all in
the same transaction (does that square with what you know of the
update process?). For instance, the index entries for 26245218
look like

Item 181 -- Length: 16 Offset: 5024 (0x13a0) Flags: NORMAL
Block Id: 11249625 linp Index: 3 Size: 16
Has Nulls: 0 Has Varwidths: 0

13a0: ab00d9a7 03001000 62789001 00000000 ........bx......

Item 182 -- Length: 16 Offset: 5040 (0x13b0) Flags: DEAD
Block Id: 1501614 linp Index: 16 Size: 16
Has Nulls: 0 Has Varwidths: 0

13b0: 1600aee9 10001000 62789001 00000000 ........bx......

Item 183 -- Length: 16 Offset: 5328 (0x14d0) Flags: NORMAL
Block Id: 1501614 linp Index: 5 Size: 16
Has Nulls: 0 Has Varwidths: 0

14d0: 1600aee9 05001000 62789001 00000000 ........bx......

We can see the heap entries at (11249625,3) and (1501614,5)
but there's nothing at (1501614,16). This is explainable if
you believe that a write of page 1501614 got lost.

It's conceivable that this was a software glitch not a hardware glitch,
ie Postgres forgetting the dirty-bits for a batch of pages, but we've
not seen any similar reports elsewhere. So I'm leaning to the hardware
explanation.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2013-02-01 22:16:45 Re: postgres 9.2.2 point conversion from polygon doesn't always give accurate center
Previous Message Tom Lane 2013-02-01 20:08:54 Re: BUG #7819: missing chunk number 0 for toast value 1235919 in pg_toast_35328