Bug in new buffer freelist code

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Bug in new buffer freelist code
Date: 2003-12-23 18:11:08
Message-ID: 21676.1072203068@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I just had the parallel regression tests hang up due to what appears to
be a bug in the new ARC code. The CLUSTER test gets into an infinite
loop trying to do "CLUSTER clstr_1;". The loop is in
StrategyInvalidateBuffer's check that the buffer is already in the
freelist; it isn't, and the freelist is circular.

(gdb) bt
#0 0x1fe8a8 in StrategyInvalidateBuffer (buf=0xc3a56f60) at freelist.c:733
#1 0x1fbf08 in FlushRelationBuffers (rel=0x400fa298, firstDelBlock=0)
at bufmgr.c:1596
#2 0x1479fc in swap_relfilenodes (r1=143786, r2=143915) at cluster.c:736
#3 0x147458 in rebuild_relation (OldHeap=0x2322b, indexOid=143788)
at cluster.c:455
#4 0x1473b0 in cluster_rel (rvtc=0x7b03bed8, recheck=0 '\000')
at cluster.c:395
#5 0x146ff4 in cluster (stmt=0x400b88a8) at cluster.c:232
#6 0x21c60c in ProcessUtility (parsetree=0x400b88a8, dest=0x400b88e8,
completionTag=0x7b03bbe8 "") at utility.c:1033
... etc ...

(gdb) p *buf
$5 = {bufNext = -1, data = 7211904, tag = {rnode = {tblNode = 17142,
relNode = 143906}, blockNum = 0}, buf_id = 850, flags = 14,
refcount = 0, io_in_progress_lock = 1721, cntx_lock = 1722,
cntxDirty = 0 '\000', wait_backend_id = 0}
(gdb) p *StrategyControl
$1 = {target_T1_size = 423, listUnusedCDB = 249, listHead = {464, 967, 1692,
1227}, listTail = {968, 645, 1528, 1694}, listSize = {364, 413, 584, 636},
listFreeBuffers = 839, num_lookup = 546939, num_hit = {1378, 246896, 282639,
3935}, stat_report = 0, cdb = {{prev = 386, next = 23, list = 3,
buf_tag = {rnode = {tblNode = 17142, relNode = 19080}, blockNum = 30},
buf_id = -1, t1_xid = 3402}}}
(gdb) p BufferDescriptors[839]
$2 = {bufNext = 839, data = 7121792, tag = {rnode = {tblNode = 17142,
relNode = 143906}, blockNum = 0}, buf_id = 839, flags = 14,
refcount = 0, io_in_progress_lock = 1699, cntx_lock = 1700,
cntxDirty = 0 '\000', wait_backend_id = 0}

So we've got a couple of problems here: buffers 839 and 850 both claim
to contain block 0 of rel 143906 (which is clstr_1), and the freelist
is circular.

This doesn't seem to be super reproducible, but there's definitely a
problem in there somewhere.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adam Witney 2003-12-23 20:24:57 One regression failure with 7.4.1 on Debian 3.0r2
Previous Message Tom Lane 2003-12-23 17:20:16 Re: Quoting of psql \d output