Re: 8.4.0 data loss / HOT-related bug

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Radoslaw Zielinski <radek(at)pld-linux(dot)org>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: 8.4.0 data loss / HOT-related bug
Date: 2009-08-21 16:38:17
Message-ID: 407d949e0908210938u50556a57t1f59993aa6de590b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Aug 21, 2009 at 5:15 PM, Alvaro
Herrera<alvherre(at)commandprompt(dot)com> wrote:
> I wonder if this could be explained by xid=6179 not marked as committed
> in clog.  I'd try flipping that bit and see what happens ...

Well nothing's going to help much now. Firstly, once the hint bit gets
set nothing second-guesses that and checks the clog anyways. And
secondly the new version of the tuple is already vacuumed.

Either of two things are true.

Either transaction 6179 committed, which would explain why the toast
tuples are missing. In which case sometime later this hint bit became
set and the new version pruned. I don't know if bad memory could cause
all that to happen, would the HOT pruning logic see the hint bit and
decide to prune based on that? I suppose a bad bit hitting the clog
could cause everything though.

Alternatively 6179 aborted but somebody along the way got that wrong
and marked the toast tuples dead (and maybe vacuumed them) thinking it
had committed. That's going to be harder to tell if that's what
happened because we don't have any pointers to the specific page in
the toast table. Not unless you can dump the whole index and find
pointers in there or can find the details in the wal log.

--
greg
http://mit.edu/~gsstark/resume.pdf

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Radoslaw Zielinski 2009-08-21 16:38:30 Re: 8.4.0 data loss / HOT-related bug
Previous Message Tom Lane 2009-08-21 16:33:03 Re: 8.4.0 data loss / HOT-related bug