Quick Links

Re: Bug in new buffer freelist code

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Bug in new buffer freelist code
Date:	2004-01-07 18:09:20
Message-ID:	20105.1073498960@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> So far I was not able to reproduce the problem. If it is what we think,
> then it should be possible to reproduce it with just one single DB
> connection, no?

The reason it is difficult to reproduce is that when
StrategyInvalidateBuffer puts a dead buffer on the freelist, it puts it
at the head; therefore, the very next call of StrategyGetBuffer will
cause that buffer to be recycled for some other purpose. The only way
the dead buffer will survive long enough for another
FlushRelationBuffers scan to find it again is if some other backend does
some other StrategyInvalidateBuffer calls very soon. Hmm ... actually,
just clustering a large relation might be enough, but I didn't try that.
With the small relations that the regression tests try to cluster, you
don't see the failure because the dead buffers are holding some other
page by the time the bug could occur.

What I did to make it reproducible was to insert the attached hack that
makes StrategyInvalidateBuffer put the dead buffer at the end of the
freelist not the start. Then, with a freshly started postmaster, it
blows up immediately:

regression=# CREATE TABLE clstr_1 (a INT PRIMARY KEY);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "clstr_1_pkey" for table "clstr_1"
CREATE TABLE
regression=# INSERT INTO clstr_1 VALUES (2);
INSERT 335205 1
regression=# INSERT INTO clstr_1 VALUES (1);
INSERT 335206 1
regression=# CLUSTER clstr_1_pkey ON clstr_1;
CLUSTER
regression=# select * from clstr_1;
a
---
1
2
(2 rows)

regression=# CLUSTER clstr_1;
ERROR: StrategyInvalidateBuffer() buffer 174 not in directory or freelist
CONTEXT: writing block 0 of relation 154625/335218

With this mod we don't see the infinite loop, and the reason is that
inserting a buffer into the freelist when it's already there doesn't
make a circular freelist (it may lose some of the freelist, instead).
I'm not recommending the mod for anything except exhibiting this bug,
however.

I had a breakpoint in StrategyInvalidateBuffer while doing the above,
and saw the following sequence of passed-in buffers:

Breakpoint 3, StrategyInvalidateBuffer (buf=0xc1fa4e00) at freelist.c:711

$1 = {bufNext = -1, data = 1502080, tag = {rnode = {tblNode = 154625,
relNode = 335212}, blockNum = 0}, buf_id = 153, flags = 14,
refcount = 0, io_in_progress_lock = 327, cntx_lock = 328,
cntxDirty = 0 '\000', wait_backend_id = 0}

$2 = {bufNext = -1, data = 1559424, tag = {rnode = {tblNode = 154625,
relNode = 335218}, blockNum = 0}, buf_id = 160, flags = 14,
refcount = 0, io_in_progress_lock = 341, cntx_lock = 342,
cntxDirty = 0 '\000', wait_backend_id = 0}

-- this call is redundant (it's from DropRelationBuffers after a FlushRB)
$3 = {bufNext = 160, data = 1502080, tag = {rnode = {tblNode = 154625,
relNode = 335212}, blockNum = 0}, buf_id = 153, flags = 14,
refcount = 0, io_in_progress_lock = 327, cntx_lock = 328,
cntxDirty = 0 '\000', wait_backend_id = 0}

$4 = {bufNext = -1, data = 1493888, tag = {rnode = {tblNode = 154625,
relNode = 335214}, blockNum = 0}, buf_id = 152, flags = 14,
refcount = 0, io_in_progress_lock = 325, cntx_lock = 326,
cntxDirty = 0 '\000', wait_backend_id = 0}

$5 = {bufNext = -1, data = 1510272, tag = {rnode = {tblNode = 154625,
relNode = 335214}, blockNum = 1}, buf_id = 154, flags = 14,
refcount = 0, io_in_progress_lock = 329, cntx_lock = 330,
cntxDirty = 0 '\000', wait_backend_id = 0}

$6 = {bufNext = -1, data = 1665920, tag = {rnode = {tblNode = 154625,
relNode = 335220}, blockNum = 0}, buf_id = 173, flags = 14,
refcount = 0, io_in_progress_lock = 367, cntx_lock = 368,
cntxDirty = 0 '\000', wait_backend_id = 0}

$7 = {bufNext = -1, data = 1682304, tag = {rnode = {tblNode = 154625,
relNode = 335220}, blockNum = 1}, buf_id = 175, flags = 14,
refcount = 0, io_in_progress_lock = 371, cntx_lock = 372,
cntxDirty = 0 '\000', wait_backend_id = 0}

-- this call is redundant
$8 = {bufNext = 154, data = 1493888, tag = {rnode = {tblNode = 154625,
relNode = 335214}, blockNum = 0}, buf_id = 152, flags = 14,
refcount = 0, io_in_progress_lock = 325, cntx_lock = 326,
cntxDirty = 0 '\000', wait_backend_id = 0}

-- this call is redundant
$9 = {bufNext = 173, data = 1510272, tag = {rnode = {tblNode = 154625,
relNode = 335214}, blockNum = 1}, buf_id = 154, flags = 14,
refcount = 0, io_in_progress_lock = 329, cntx_lock = 330,
cntxDirty = 0 '\000', wait_backend_id = 0}

-- this call is redundant
$10 = {bufNext = 154, data = 1493888, tag = {rnode = {tblNode = 154625,
relNode = 335214}, blockNum = 0}, buf_id = 152, flags = 14,
refcount = 0, io_in_progress_lock = 325, cntx_lock = 326,
cntxDirty = 0 '\000', wait_backend_id = 0}

-- this call is redundant
$11 = {bufNext = 173, data = 1510272, tag = {rnode = {tblNode = 154625,
relNode = 335214}, blockNum = 1}, buf_id = 154, flags = 14,
refcount = 0, io_in_progress_lock = 329, cntx_lock = 330,
cntxDirty = 0 '\000', wait_backend_id = 0}

-- this call is redundant
$12 = {bufNext = 160, data = 1502080, tag = {rnode = {tblNode = 154625,
relNode = 335212}, blockNum = 0}, buf_id = 153, flags = 14,
refcount = 0, io_in_progress_lock = 327, cntx_lock = 328,
cntxDirty = 0 '\000', wait_backend_id = 0}

second CLUSTER starts here:

-- this call is redundant
$13 = {bufNext = 152, data = 1559424, tag = {rnode = {tblNode = 154625,
relNode = 335218}, blockNum = 0}, buf_id = 160, flags = 14,
refcount = 0, io_in_progress_lock = 341, cntx_lock = 342,
cntxDirty = 0 '\000', wait_backend_id = 0}

$14 = {bufNext = -1, data = 1674112, tag = {rnode = {tblNode = 154625,
relNode = 335218}, blockNum = 0}, buf_id = 174, flags = 14,
refcount = 0, io_in_progress_lock = 369, cntx_lock = 370,
cntxDirty = 0 '\000', wait_backend_id = 0}

Breakpoint 1, errfinish (dummy=0) at elog.c:320
320 int elevel = edata->elevel;
(gdb) bt
#0 errfinish (dummy=0) at elog.c:320
#1 0x29675c in elog_finish (elevel=1074079512,
fmt=0xa5da4 "StrategyInvalidateBuffer() buffer %d not in directory or freeli
st") at elog.c:853
#2 0x1fedb0 in StrategyInvalidateBuffer (buf=0xc1fa50a0) at freelist.c:751
#3 0x1fc360 in FlushRelationBuffers (rel=0x400cb350, firstDelBlock=0)
at bufmgr.c:1596
#4 0x147c14 in swap_relfilenodes (r1=335212, r2=335221) at cluster.c:736
#5 0x147670 in rebuild_relation (OldHeap=0x51d75, indexOid=335214)
at cluster.c:455
#6 0x1475c8 in cluster_rel (rvtc=0x7b03b8b8, recheck=0 '\000')
at cluster.c:395
#7 0x14720c in cluster (stmt=0x400b8de0) at cluster.c:232
#8 0x21cc64 in ProcessUtility (parsetree=0x400b8de0, dest=0x400b8e20,
completionTag=0x7b03b5c8 "") at utility.c:1033

Note that buffer 160 gets cleared twice in this sequence. The second
time, the CDB for 174 gets found, leading to failure when 174 is cleared.

I believe the correct fix is to do CLEAR_BUFFERTAG on the buffer (and
maybe the CDB too?) when returning a buffer to the freelist in
StrategyInvalidateBuffer. It might also be worthwhile to add another
BM_FLAG bit that specifically indicates a buffer is on the freelist,
and set/clear/test that at appropriate spots.

I am actually of the opinion that the assertion code at the bottom of
StrategyInvalidateBuffer is all wrong: it should *never* be possible
to call StrategyInvalidateBuffer on something that's already in the
freelist, and the only reason you had that in there was because of
this mistaken failure to clear the buffertag, leading to wrong calls
from linear searches of the buffer array. I think you should just
elog(ERROR) any time you fail to find a CDB in this routine.

regards, tom lane

*** src/backend/storage/buffer/freelist.c.orig Sat Nov 29 14:51:56 2003
--- src/backend/storage/buffer/freelist.c Wed Jan 7 12:44:33 2004
***************
*** 718,725 ****
cdb->next = StrategyControl->listUnusedCDB;
StrategyControl->listUnusedCDB = cdb_id;

! buf->bufNext = StrategyControl->listFreeBuffers;
! StrategyControl->listFreeBuffers = buf->buf_id;
return;
}

--- 718,737 ----
cdb->next = StrategyControl->listUnusedCDB;
StrategyControl->listUnusedCDB = cdb_id;

! if (StrategyControl->listFreeBuffers < 0)
! {
! buf->bufNext = StrategyControl->listFreeBuffers;
! StrategyControl->listFreeBuffers = buf->buf_id;
! }
! else
! {
! /* stash it at the bottom of the freelist */
! buf_id = StrategyControl->listFreeBuffers;
! while (BufferDescriptors[buf_id].bufNext >= 0)
! buf_id = BufferDescriptors[buf_id].bufNext;
! buf->bufNext = -1;
! BufferDescriptors[buf_id].bufNext = buf->buf_id;
! }
return;
}

In response to

Re: Bug in new buffer freelist code at 2004-01-07 15:02:43 from Jan Wieck

Responses

Re: Bug in new buffer freelist code at 2004-01-07 17:30:02 from Jan Wieck
Re: Bug in new buffer freelist code at 2004-01-07 18:38:26 from Neil Conway

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kurt Roeckx	2004-01-07 18:25:59	Re: Brokenness in parsing of pg_hba.conf
Previous Message	Bruce Momjian	2004-01-07 17:57:35	Re: [COMMITTERS] pgsql-server/ oc/src/sgml/catalogs.sgml rc/bac