Re: LWLock deadlock and gdb advice

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki <hlinnaka(at)iki(dot)fi>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LWLock deadlock and gdb advice
Date: 2015-07-19 19:23:38
Message-ID: CAMkU=1zUc=h0oCZntaJaqqW7gxxVxCWsYq8DD2t7oHgsgVEsgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> wrote:
>
>>
>> Both. Here's the patch.
>>
>> Previously, LWLockAcquireWithVar set the variable associated with the
>> lock atomically with acquiring it. Before the lwlock-scalability changes,
>> that was straightforward because you held the spinlock anyway, but it's a
>> lot harder/expensive now. So I changed the way acquiring a lock with a
>> variable works. There is now a separate flag, LW_FLAG_VAR_SET, which
>> indicates that the current lock holder has updated the variable. The
>> LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(),
>> which always clears the LW_FLAG_VAR_SET flag, and you can call
>> LWLockUpdateVar() after that if you want to set the variable immediately.
>> LWLockWaitForVar() always waits if the flag is not set, i.e. it will not
>> return regardless of the variable's value, if the current lock-holder has
>> not updated it yet.
>>
>>
> I ran this for a while without casserts and it seems to work. But with
> casserts, I get failures in the autovac process on the GIN index.
>
> I don't see how this is related to the LWLock issue, but I didn't see it
> without your patch. Perhaps the system just didn't survive long enough to
> uncover it without the patch (although it shows up pretty quickly). It
> could just be an overzealous Assert, since the casserts off didn't show
> problems.
>

> bt and bt full are shown below.
>
> Cheers,
>
> Jeff
>
> #0 0x0000003dcb632625 in raise () from /lib64/libc.so.6
> #1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6
> #2 0x0000000000930b7a in ExceptionalCondition (
> conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >=
> (__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc
> "FailedAssertion",
> fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54
> #3 0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
> ginvacuum.c:713
>

It now looks like this *is* unrelated to the LWLock issue. The assert that
it is tripping over was added just recently (302ac7f27197855afa8c) and so I
had not been testing under its presence until now. It looks like it is
finding all-zero pages (index extended but then a crash before initializing
the page?) and it doesn't like them.

(gdb) f 3
(gdb) p *(char[8192]*)(page)
$11 = '\000' <repeats 8191 times>

Presumably before this assert, such pages would just be permanently
orphaned.

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2015-07-19 19:39:11 Re: Implementation of global temporary tables?
Previous Message Pavel Stehule 2015-07-19 19:08:37 Re: pg_dump quietly ignore missing tables - is it bug?