Re: WIP: SP-GiST, Space-Partitioned GiST

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: SP-GiST, Space-Partitioned GiST
Date: 2011-12-06 20:25:11
Message-ID: 14742.1323203111@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
> There is one annoying problem under MAC OS (Linux, FreeBSD have no problem), we
> just can't figure out how to find it, since we are not familiar with MAC OS -
> it fails to restart after 'kill -9' backend, but only if sources were
> compiled with -O2 option (no problem occured with -O0). Since the fail happens
> not every time, we use following script to reproduce the problem. We ask
> MAC OS guru to help us debugging this problem.

I don't think it's Mac-specific at all; it looks to me like garden
variety uninitialized data, specifically that there are paths through
doPickSplit that don't set xlrec.newPage. The crash I'm seeing is

TRAP: FailedAssertion("!(offset <= (((PageHeader) (page))->pd_lower <= (__builtin_offsetof (PageHeaderData, pd_linp)) ? 0 : ((((PageHeader) (page))->pd_lower - (__builtin_offsetof (PageHeaderData, pd_linp))) / sizeof(ItemIdData))) + 1)", File: "spgxlog.c", Line: 81)

#0 0x00007fff883f982a in __kill ()
#1 0x00007fff85bdda9c in abort ()
#2 0x0000000103165a71 in ExceptionalCondition (conditionName=<value temporarily unavailable, due to optimizations>, errorType=<value temporarily unavailable, due to optimizations>, fileName=<value temporarily unavailable, due to optimizations>, lineNumber=<value temporarily unavailable, due to optimizations>) at assert.c:57
#3 0x0000000102eeec73 in addOrReplaceTuple (page=0x74cc <Address 0x74cc out of bounds>, tuple=0x7faa1182d64c " ", size=88, offset=70) at spgxlog.c:81
#4 0x0000000102eed4bc in spgRedoPickSplit [inlined] () at /Users/tgl/pgsql/src/backend/access/spgist/spgxlog.c:504
#5 0x0000000102eed4bc in spg_redo (record=0x7fff62a5ccf0) at spgxlog.c:803
#6 0x0000000102ec4f48 in StartupXLOG () at xlog.c:6534
#7 0x0000000103054378 in StartupProcessMain () at startup.c:220
#8 0x0000000102ef4449 in AuxiliaryProcessMain (argc=2, argv=0x7fff62a60030) at bootstrap.c:414

The xlog record it's working on is

(gdb) p *(spgxlogPickSplit*)(0x7fcb20826600 + 32)
$6 = {
node = {
spcNode = 1663,
dbNode = 41578,
relNode = 204800
},
nTuples = 75,
nNodes = 4,
blknoSrc = 988,
nDelete = 74,
blknoInner = 929,
offnumInner = 70,
newPage = 1 '\001',
blknoParent = 929,
offnumParent = 13,
nodeI = 2,
stateSrc = {
attType_attlen = 16,
fakeTupleSize = 32,
isBuild = 1
}
}

Since newPage is set, addOrReplaceTuple gets called on a freshly
initialized page, and not surprisingly complains that offset 70 is
way out of range. Maybe there's something wrong with the replay
logic, but what I'm thinking is that newPage should not have been
true here, which means that doPickSplit failed to set it correctly,
which doesn't look at all improbable. I added a memset at the
top of doPickSplit to force the whole struct to zeroes, and so far
haven't seen the crash again.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ben hockey 2011-12-06 20:26:12 Re: ecmascript 5 DATESTYLE
Previous Message Pavel Stehule 2011-12-06 20:20:54 Re: ecmascript 5 DATESTYLE