Skip site navigation (1) Skip section navigation (2)

Re: buffer assertion tripping under repeat pgbench load

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: buffer assertion tripping under repeat pgbench load
Date: 2012-12-26 23:47:59
Message-ID: 50DB8CAF.9020303@2ndQuadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On 12/26/12 5:40 PM, Greg Stark wrote:
> Also, do you have the buffer id of the broken buffer? I wonder if it's
> not just any buffer but always the same same buffer even if it's a
> different block in that buffer.

I just added something looking for that.

Before I got to that I found another crash:

2012-12-26 18:01:42 EST [973]: WARNING:  refcount of base/16384/65553 
blockNum=22140, flags=0x1a7 is 1073741824 should be 0, globally: 0
2012-12-26 18:01:42 EST [973]: WARNING:  buffers with non-zero refcount is 1
-bash-4.1$ export PGPORT=5433
-bash-4.1$ psql -d pgbench -c "select relname,relkind,relfilenode from 
pg_class where relfilenode=65553"
         relname        | relkind | relfilenode
-----------------------+---------+-------------
  pgbench_accounts_pkey | i       |       65553

So back to an index again.

> (Or maybe your compiler is laying out these objects
> in a different way from most people's compilers and we're overwriting
> past the end of some other object routinely but yours is the only
> place where it's being laid out preceding a critical data structure)

I doubt there is anything special about this compiler, given that it's 
the standard RedHat 6 build stack cloned via Scientific Linux 6.0.

The two things I expect I'm doing differently than most tests are:

-Using 2GB for shared_buffers
-Running a write heavy test that goes for many hours

It would be nice if this were just something like a memory issue on this 
system.  That I'm getting the same very odd value every time--this 
refcount of 1073741824--makes it seem less random than I expect from bad 
memory.  Once I get a few more crash samples (with buffer ids) I'll shut 
the system down for a pass of memtest86+.

Regardless, I've copied over the same source code and test configuration 
to a similar system here.  If I can reproduce this on a second system, 
I'll push all the details out to the list, hopeful that other people 
might see it too.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


In response to

Responses

pgsql-hackers by date

Next:From: Greg SmithDate: 2012-12-27 00:00:51
Subject: Re: buffer assertion tripping under repeat pgbench load
Previous:From: Greg StarkDate: 2012-12-26 22:40:09
Subject: Re: buffer assertion tripping under repeat pgbench load

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group