Re: buffer assertion tripping under repeat pgbench load

From: Greg Stark <stark(at)mit(dot)edu>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: buffer assertion tripping under repeat pgbench load
Date: 2012-12-27 00:23:43
Message-ID: CAM-w4HP846bCfG3injcQmesMhuvW2hRAoROXuZYjvnssXsh6rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 26, 2012 at 11:47 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> It would be nice if this were just something like a memory issue on this
> system. That I'm getting the same very odd value every time--this refcount
> of 1073741824--makes it seem less random than I expect from bad memory.
> Once I get a few more crash samples (with buffer ids) I'll shut the system
> down for a pass of memtest86+.

Well that's a one-bit error and it would never get detected until the
value was decremented down to what should be zero so that's pretty
much exactly what I would expect to see from a memory or cpu error.

What's odd is that it's always hitting the LocalRefCount array, not
any other large data structure. For 2GB of buffers the LocalRefCount
will be 1MB per client. That's a pretty big target but it's hardly the
only such data structure in Postgres.

It's also possible it's a bad cpu, not bad memory. If it affects
decrement or increment in particular it's possible that the pattern of
usage on LocalRefCount is particularly prone to triggering it.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2012-12-27 00:48:52 Proposal: Store "timestamptz" of database creation on "pg_database"
Previous Message Greg Smith 2012-12-27 00:00:51 Re: buffer assertion tripping under repeat pgbench load