On Tue, Jan 31, 2012 at 4:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> BTW, after a bit more reflection it occurs to me that it's not so much
>>> that the data is necessarily *bad*, as that it seemingly doesn't match
>>> the tuple descriptor that the backend's trying to interpret it with.
>> Hmm. Could this be caused by the recovery process failing to obtain a
>> sufficiently strong lock on a buffer before replaying some WAL record?
> Well, I was kinda speculating that inadequate locking could result in
> use of a stale (or too-new?) tuple descriptor, and that would be as good
> a candidate as any if the basic theory were right. But Bridget says
> they are not doing any DDL, so it's hard to see how there'd be any tuple
> descriptor mismatch at all. Still baffled ...
No, I wasn't thinking about a tuple descriptor mismatch. I was
imagining that the page contents themselves might be in flux while
we're trying to read from it. Off the top of my head I don't see how
that can happen, but it would be awfully interesting to be able to see
which WAL record last touched the relevant heap page, and how long
before the error that happened.
The Enterprise PostgreSQL Company
In response to
pgsql-bugs by date
|Next:||From: Tom Lane||Date: 2012-02-01 16:00:25|
|Subject: Re: BUG #6424: Possible error in time to seconds conversion |
|Previous:||From: postgres||Date: 2012-02-01 14:28:30|
|Subject: BUG #6425: Bus error in slot_deform_tuple|