On Wed, Dec 26, 2012 at 11:47 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> It would be nice if this were just something like a memory issue on this
> system. That I'm getting the same very odd value every time--this refcount
> of 1073741824--makes it seem less random than I expect from bad memory.
> Once I get a few more crash samples (with buffer ids) I'll shut the system
> down for a pass of memtest86+.
Well that's a one-bit error and it would never get detected until the
value was decremented down to what should be zero so that's pretty
much exactly what I would expect to see from a memory or cpu error.
What's odd is that it's always hitting the LocalRefCount array, not
any other large data structure. For 2GB of buffers the LocalRefCount
will be 1MB per client. That's a pretty big target but it's hardly the
only such data structure in Postgres.
It's also possible it's a bad cpu, not bad memory. If it affects
decrement or increment in particular it's possible that the pattern of
usage on LocalRefCount is particularly prone to triggering it.
In response to
pgsql-hackers by date
|Next:||From: Fabrízio de Royes Mello||Date: 2012-12-27 00:48:52|
|Subject: Proposal: Store "timestamptz" of database creation on "pg_database"|
|Previous:||From: Greg Smith||Date: 2012-12-27 00:00:51|
|Subject: Re: buffer assertion tripping under repeat pgbench load|