Re: buffer assertion tripping under repeat pgbench load

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: buffer assertion tripping under repeat pgbench load
Date: 2012-12-26 23:47:59
Message-ID: 50DB8CAF.9020303@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/26/12 5:40 PM, Greg Stark wrote:
> Also, do you have the buffer id of the broken buffer? I wonder if it's
> not just any buffer but always the same same buffer even if it's a
> different block in that buffer.

I just added something looking for that.

Before I got to that I found another crash:

2012-12-26 18:01:42 EST [973]: WARNING: refcount of base/16384/65553
blockNum=22140, flags=0x1a7 is 1073741824 should be 0, globally: 0
2012-12-26 18:01:42 EST [973]: WARNING: buffers with non-zero refcount is 1
-bash-4.1$ export PGPORT=5433
-bash-4.1$ psql -d pgbench -c "select relname,relkind,relfilenode from
pg_class where relfilenode=65553"
relname | relkind | relfilenode
-----------------------+---------+-------------
pgbench_accounts_pkey | i | 65553

So back to an index again.

> (Or maybe your compiler is laying out these objects
> in a different way from most people's compilers and we're overwriting
> past the end of some other object routinely but yours is the only
> place where it's being laid out preceding a critical data structure)

I doubt there is anything special about this compiler, given that it's
the standard RedHat 6 build stack cloned via Scientific Linux 6.0.

The two things I expect I'm doing differently than most tests are:

-Using 2GB for shared_buffers
-Running a write heavy test that goes for many hours

It would be nice if this were just something like a memory issue on this
system. That I'm getting the same very odd value every time--this
refcount of 1073741824--makes it seem less random than I expect from bad
memory. Once I get a few more crash samples (with buffer ids) I'll shut
the system down for a pass of memtest86+.

Regardless, I've copied over the same source code and test configuration
to a similar system here. If I can reproduce this on a second system,
I'll push all the details out to the list, hopeful that other people
might see it too.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2012-12-27 00:00:51 Re: buffer assertion tripping under repeat pgbench load
Previous Message Greg Stark 2012-12-26 22:40:09 Re: buffer assertion tripping under repeat pgbench load