Re: PANIC in GIN code

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PANIC in GIN code
Date: 2015-06-28 19:31:11
Message-ID: 55904B7F.9030000@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/26/2015 10:53 PM, Jeff Janes wrote:
> On Fri, Jun 26, 2015 at 11:40 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> wrote:
>
>> The page is being split (that's evident from "info=48" above).
>> ginPlaceToPage calls GinNewBuffer, which calls GetFreeIndexPage(). That
>> finds a page that can be recycled, and marks it as used.
>> RecordUsedIndexPage calls MarkBufferDirtyHint(), which in turn calls
>> XLogSaveBufferForHint() to create a full-page image record of the page.
>> That calls XLogBeginInsert() + XLogInsert(), and leaves the
>> begininsert_called == false.
>>
>> If you had assertions enabled, you'd see the assertion in
>> XLogBeginInsert() to fail.
>>
>> I'll look into that over the weekend..

Committed a fix, fortunately it was easy to rearrange that code so that
XLogBeginInsert() is called after the GinNewBuffer() calls. Can you
verify that it fixed your test case, please?

> Should the assertion in XLogBeginInsert() be a elog(FATAL,...) instead?
> The performance impact should be negligible (touches process-local memory
> which is already being touched a few instructions later) and it will
> produce better bug reports, and backtraces, should there be more issues.

Hmm, my original thinking was "no", so that if there is a spurious
XLogBeginInsert() call somewhere without a corresponding XLogInsert()
call, that wouldn't be unnecessarily promoted into an error in a
non-assertions-enabled build. But thinking a bit more, if any data has
already been registered as well, the WAL record will be badly corrupt,
which is much worse. Also, if we look at this GIN bug, if
XLogBeginInsert() had caught the error instead of XLogInsert(), it
would've been caught outside the critical section, which is again better.

So yeah, I turned that into an elog(ERROR). (I don't see why it would
need to be FATAL.)

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-06-28 19:55:16 Re: pg_rewind failure by file deletion in source server
Previous Message Peter Geoghegan 2015-06-28 19:29:37 Re: Solaris testers wanted for strxfrm() behavior