"Mason Hale" <masonhale(at)gmail(dot)com> writes:
> Tom, I'll send these to you privately.
Thanks. I don't see anything particularly surprising there though.
What I was wondering about was whether your application was in the
habit of doing repeated no-op updates on the same "entry" row.
The pg_filedump outputs seem to blow away any theory of hardware-level
duplication of the row --- all the tuples on both pages have the
expected block number in their headers, so it seems PG deliberately
put them where they are. And the two tuples at issue are both marked
UPDATED, so they clearly are updated versions of some now-lost original.
What is not clear is whether they are independent updates of the same
original or whether there was a chain of updates --- that is, was the
newer one (which from the timestamp must be the one in the
lower-numbered block) made by an update from the older one, or from the
Since the older one doesn't show any sign of having been updated itself
(in particular, no xmax and its ctid still points to itself), the former
theory would require assuming that the page update "got lost" --- was
discarded without being written to disk. On the other hand, the latter
theory seems to require a similar assumption with respect to whatever
page held the original.
Given this, and the index corruption you showed before (the wrong
sibling link, which would represent index breakage quite independent of
what was in the heap), and the curious contents of your WAL files
(likewise not explainable by anything going wrong within a table),
I'm starting to think that Occam's razor says you've got hardware
problems. Or maybe a kernel-level bug that is causing writes to get
regards, tom lane
In response to
pgsql-bugs by date
|Next:||From: Tom Lane||Date: 2007-12-31 17:33:09|
|Subject: Re: Duplicate values found when reindexing unique index |
|Previous:||From: Simon Riggs||Date: 2007-12-31 17:20:15|
|Subject: Re: Duplicate values found when reindexing unique index|