Skip site navigation (1) Skip section navigation (2)

Re: buffer assertion tripping under repeat pgbench load

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: buffer assertion tripping under repeat pgbench load
Date: 2013-01-09 19:36:22
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On 12/23/12 3:17 PM, Simon Riggs wrote:
> We already have PrintBufferLeakWarning() for this, which might be a bit neater.

It does look like basically the same info.  I hacked the code to 
generate this warning all the time.  Patch from Andres I've been using:

WARNING:  refcount of buf 1 containing global/12886 blockNum=0, 
flags=0x106 is 0 should be 0, globally: 0

And using PrintBufferLeakWarning with the same input:

WARNING:  buffer refcount leak: [001] (rel=global/12886, blockNum=0, 
flags=0x106, refcount=0 0)

Attached is a change I'd propose be committed.  This doesn't do anything 
in a non-assertion build, it just follows all of the "why don't you 
try..." suggestions I've gotten here.  I still have no idea why these 
are popping out for me.  I'm saving the state on all this to try and 
track it down again later.  I'm out of time to bang my head on this 
particular thing anymore right now, it doesn't seem that important given 
it's not reproducible anywhere else.

Any assertion errors that come out will be more useful with just this 
change though.  Output looks like this (from hacked code to always trip 
it and shared_buffers=16):

WARNING:  buffer refcount leak: [016] 
(rel=pg_tblspc/0/PG_9.3_201212081/0/0_(null), blockNum=4294967295, 
flags=0x0, refcount=0 0)
WARNING:  buffers with non-zero refcount is 16
TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line: 

I intentionally used a regular diff for this small one because it shows 
the subtle changes I made here better.  I touched more than I strictly 
had to because this area of the code is filled with off by one 
corrections, and adding this warning highlighted one I objected to. 
Note how the function is defined:

PrintBufferLeakWarning(Buffer buffer)

The official buffer identifier description says:

typedef int Buffer;
Zero is invalid, positive is the index of a shared buffer (1..NBuffers),
negative is the index of a local buffer (-1 .. -NLocBuffer).

However, the loop in AtEOXact_Buffers aimed to iterate over 
PrivateRefCount, using array indexes 0...NBuffers-1.  It was using an 
array index rather than a proper buffer identification number.  And that 
meant when I tried to pass it to PrintBufferLeakWarning, it asserted on 
the buffer id--because "Zero is invalid".

I could have just fixed that with an off by one fix in the other 
direction.  That seemed wrong to me though.  All of the other code like 
PrintBufferLeakWarning expects to see a Buffer value.  It already 
corrects for the off by one problem like this:

                 buf = &BufferDescriptors[buffer - 1];
                 loccount = PrivateRefCount[buffer - 1];

It seemed cleaner to me to make the i variable in AtEOXact_Buffers be a 
Buffer number.  Then you access PrivateRefCount[buffer - 1] in that 
code, making it look like more of the surrounding references.  And the 
value goes directly into PrintBufferLeakWarning without a problem.

Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support

Attachment: show-private-leak-errors-v1.patch
Description: text/plain (738 bytes)

In response to

pgsql-hackers by date

Next:From: Stephen FrostDate: 2013-01-09 19:56:08
Subject: Re: Index build temp files
Previous:From: Simon RiggsDate: 2013-01-09 19:32:59
Subject: Re: Further pg_upgrade analysis for many tables

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group