Re: PANIC in GIN code

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PANIC in GIN code
Date: 2015-06-26 19:53:14
Message-ID: CAMkU=1yoHa2tiyt_F_j6J5bqj66iY798PpBWvrGTOEaiY5doHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 26, 2015 at 11:40 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
wrote:

> On 06/26/2015 08:02 PM, Jeff Janes wrote:
>
>> Under high load against f7bb7f0625771bc71869cda, I occasionally get:
>>
>> PANIC: XLogBeginInsert was not called
>>
>> It seems to only come from vacuuming.
>>
>> Here is an example back-trace:
>>
>> #0 0x0000003dcb632625 in raise () from /lib64/libc.so.6
>> #1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6
>> #2 0x000000000079a39c in errfinish (dummy=<value optimized out>) at
>> elog.c:551
>> #3 0x000000000079b0e4 in elog_finish (elevel=<value optimized out>,
>> fmt=<value optimized out>) at elog.c:1368
>> #4 0x00000000004dd1fe in XLogInsert (rmid=13 '\r', info=48 '0') at
>> xloginsert.c:412
>> #5 0x0000000000478e13 in ginPlaceToPage (btree=0x7fff1fbb7f60,
>> stack=0x2878760, insertdata=<value optimized out>, updateblkno=<value
>> optimized out>,
>> childbuf=0, buildStats=<value optimized out>) at ginbtree.c:582
>> #6 0x0000000000479a33 in ginInsertValue (btree=0x7fff1fbb7f60,
>> stack=0x2878760, insertdata=0x7fff1fbb7fe0, buildStats=0x0) at
>> ginbtree.c:751
>> #7 0x000000000047364a in ginEntryInsert (ginstate=0x7fff1fbb8280,
>> attnum=54624, key=1, category=<value optimized out>, items=0x287d3c0,
>> nitem=1,
>> buildStats=0x0) at gininsert.c:234
>> #8 0x000000000047ef4b in ginInsertCleanup (ginstate=0x7fff1fbb8280,
>> vac_delay=<value optimized out>, stats=0x2868d90) at ginfast.c:843
>> #9 0x000000000047e024 in ginbulkdelete (fcinfo=<value optimized out>) at
>> ginvacuum.c:547
>>
>>
>> From the code, it seems obvious that XLogBeginInsert is getting called at
>> ginbtree.c line 373, so I think that some obscure code path of
>> btree->placeToPage must be either consuming or resetting that
>> XLogBeginInsert before it returns control to ginbtree.c
>>
>
> Here's a theory:
>
> First of all, you have checksums or wal_log_hints enabled.
>

Correct, checksums are on. I forgot all about that.

>
> The page is being split (that's evident from "info=48" above).
> ginPlaceToPage calls GinNewBuffer, which calls GetFreeIndexPage(). That
> finds a page that can be recycled, and marks it as used.
> RecordUsedIndexPage calls MarkBufferDirtyHint(), which in turn calls
> XLogSaveBufferForHint() to create a full-page image record of the page.
> That calls XLogBeginInsert() + XLogInsert(), and leaves the
> begininsert_called == false.
>
> If you had assertions enabled, you'd see the assertion in
> XLogBeginInsert() to fail.
>
> I'll look into that over the weekend..
>
>
Should the assertion in XLogBeginInsert() be a elog(FATAL,...) instead?
The performance impact should be negligible (touches process-local memory
which is already being touched a few instructions later) and it will
produce better bug reports, and backtraces, should there be more issues.

I thought it would takes days to reproduce this with a enable-cassert
build, but it actually reproduced much faster that way. Your theory looks
completely correct.

#0 0x0000003dcb632625 in raise () from /lib64/libc.so.6
#1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6
#2 0x00000000007d18bd in ExceptionalCondition (conditionName=<value
optimized out>, errorType=<value optimized out>, fileName=<value optimized
out>,
lineNumber=<value optimized out>) at assert.c:54
#3 0x00000000004f8067 in XLogBeginInsert () at xloginsert.c:125
#4 0x00000000004f83d5 in XLogSaveBufferForHint (buffer=241,
buffer_std=<value optimized out>) at xloginsert.c:907
#5 0x00000000006b64af in MarkBufferDirtyHint (buffer=241,
buffer_std=<value optimized out>) at bufmgr.c:3085
#6 0x00000000006be859 in fsm_set_and_search (rel=<value optimized out>,
addr=..., slot=1160, newValue=0 '\000', minValue=0 '\000') at
freespace.c:621
#7 0x00000000006be8f6 in RecordPageWithFreeSpace (rel=0x7f01e7fe0560,
heapBlk=1160, spaceAvail=<value optimized out>) at freespace.c:188
#8 0x00000000006bf105 in GetFreeIndexPage (rel=0x7f01e7fe0560) at
indexfsm.c:43
#9 0x00000000004750b2 in GinNewBuffer (index=0x7f01e7fe0560) at
ginutil.c:218
#10 0x000000000047d5c4 in ginPlaceToPage (btree=0x7fffe3a55840,
stack=0x1307b78, insertdata=<value optimized out>, updateblkno=<value
optimized out>,
childbuf=0, buildStats=0x0) at ginbtree.c:442
#11 0x000000000047f2dd in ginInsertValue (btree=0x7fffe3a55840,
stack=0x1307b78, insertdata=0x7fffe3a558c0, buildStats=0x0) at
ginbtree.c:751
#12 0x0000000000475b7b in ginEntryInsert (ginstate=0x7fffe3a55b80,
attnum=58216, key=1, category=<value optimized out>, items=0x7f01e7f61120,
nitem=11,
buildStats=0x0) at gininsert.c:234
#13 0x0000000000485dfc in ginInsertCleanup (ginstate=0x7fffe3a55b80,
vac_delay=<value optimized out>, stats=0x12bbbc8) at ginfast.c:843
#14 0x0000000000484d83 in ginbulkdelete (fcinfo=<value optimized out>) at
ginvacuum.c:547

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2015-06-26 19:59:29 Re: checkpointer continuous flushing
Previous Message Peter Eisentraut 2015-06-26 19:49:28 Re: RFC: replace pg_stat_activity.waiting with something more descriptive