Re: GIN data corruption bug(s) in 9.6devel

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN data corruption bug(s) in 9.6devel
Date: 2015-11-05 22:44:24
Message-ID: CAMkU=1w5x8rY5EvWieJsfZWB3eNGmFGHmOBf9r5VLWDTW72b2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 5, 2015 at 2:18 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> while repeating some full-text benchmarks on master, I've discovered
> that there's a data corruption bug somewhere. What happens is that while
> loading data into a table with GIN indexes (using multiple parallel
> connections), I sometimes get this:
>
> TRAP: FailedAssertion("!(((PageHeader) (page))->pd_special >=
> (__builtin_offsetof (PageHeaderData, pd_linp)))", File: "ginfast.c",
> Line: 537)
> LOG: server process (PID 22982) was terminated by signal 6: Aborted
> DETAIL: Failed process was running: autovacuum: ANALYZE messages
>
> The details of the assert are always exactly the same - it's always
> autovacuum and it trips on exactly the same check. And the backtrace
> always looks like this (full backtrace attached):
>
> #0 0x00007f133b635045 in raise () from /lib64/libc.so.6
> #1 0x00007f133b6364ea in abort () from /lib64/libc.so.6
> #2 0x00000000007dc007 in ExceptionalCondition
> (conditionName=conditionName(at)entry=0x81a088 "!(((PageHeader)
> (page))->pd_special >= (__builtin_offsetof (PageHeaderData, pd_linp)))",
> errorType=errorType(at)entry=0x81998b "FailedAssertion",
> fileName=fileName(at)entry=0x83480a "ginfast.c",
> lineNumber=lineNumber(at)entry=537) at assert.c:54
> #3 0x00000000004894aa in shiftList (stats=0x0, fill_fsm=1 '\001',
> newHead=26357, metabuffer=130744, index=0x7f133c0f7518) at ginfast.c:537
> #4 ginInsertCleanup (ginstate=ginstate(at)entry=0x7ffd98ac9160,
> vac_delay=vac_delay(at)entry=1 '\001', fill_fsm=fill_fsm(at)entry=1 '\001',
> stats=stats(at)entry=0x0) at ginfast.c:908
> #5 0x00000000004874f7 in ginvacuumcleanup (fcinfo=<optimized out>) at
> ginvacuum.c:662
> ...

This looks like it is probably the same bug discussed here:

http://www.postgresql.org/message-id/CAMkU=1xALfLhUUohFP5v33RzedLVb5aknNUjcEuM9KNBKrB6-Q@mail.gmail.com

And here:

http://www.postgresql.org/message-id/56041B26.2040902@sigaev.ru

The bug theoretically exists in 9.5, but it wasn't until 9.6 (commit
e95680832854cf300e64c) that free pages were recycled aggressively
enough that it actually becomes likely to be hit.

There are some proposed patches in those threads, but discussion on
them seems to have stalled out. Can you try one and see if it fixes
the problems you are seeing?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2015-11-05 23:08:59 Re: NOTIFY in Background Worker
Previous Message Tomas Vondra 2015-11-05 22:18:02 GIN data corruption bug(s) in 9.6devel