Re: LWLock deadlock and gdb advice

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Heikki <hlinnaka(at)iki(dot)fi>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LWLock deadlock and gdb advice
Date: 2015-07-27 05:31:39
Message-ID: 55B5C23B.40906@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-07-16 PM 04:03, Jeff Janes wrote:
> On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
>>
>> Both. Here's the patch.
>>
>> Previously, LWLockAcquireWithVar set the variable associated with the lock
>> atomically with acquiring it. Before the lwlock-scalability changes, that
>> was straightforward because you held the spinlock anyway, but it's a lot
>> harder/expensive now. So I changed the way acquiring a lock with a variable
>> works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
>> the current lock holder has updated the variable. The LWLockAcquireWithVar
>> function is gone - you now just use LWLockAcquire(), which always clears
>> the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
>> you want to set the variable immediately. LWLockWaitForVar() always waits
>> if the flag is not set, i.e. it will not return regardless of the
>> variable's value, if the current lock-holder has not updated it yet.
>>
>>
> I ran this for a while without casserts and it seems to work. But with
> casserts, I get failures in the autovac process on the GIN index.
>
> I don't see how this is related to the LWLock issue, but I didn't see it
> without your patch. Perhaps the system just didn't survive long enough to
> uncover it without the patch (although it shows up pretty quickly). It
> could just be an overzealous Assert, since the casserts off didn't show
> problems.
>
> bt and bt full are shown below.
>

I got a similar assert failure but with a btree index
(pg_attribute_relid_attnum_index). The backtrace looks like Jeff's:

(gdb) bt
#0 0x0000003969632625 in raise () from /lib64/libc.so.6
#1 0x0000003969633e05 in abort () from /lib64/libc.so.6
#2 0x000000000092eb9e in ExceptionalCondition (conditionName=0x9c2220
"!(((PageHeader) (page))->pd_special >= (__builtin_offsetof
(PageHeaderData, pd_linp)))",
errorType=0x9c0c41 "FailedAssertion", fileName=0x9c0c10 "nbtree.c",
lineNumber=903) at assert.c:54
#3 0x00000000004e02d8 in btvacuumpage (vstate=0x7fff2c7655f0, blkno=9,
orig_blkno=9) at nbtree.c:903
#4 0x00000000004e0067 in btvacuumscan (info=0x7fff2c765cd0,
stats=0x279f7d0, callback=0x668f6d <lazy_tid_reaped>,
callback_state=0x279e338, cycleid=49190)
at nbtree.c:821
#5 0x00000000004dfdde in btbulkdelete (fcinfo=0x7fff2c7657d0) at nbtree.c:676
#6 0x0000000000939769 in FunctionCall4Coll (flinfo=0x7fff2c765bb0,
collation=0, arg1=140733939342544, arg2=0, arg3=6721389, arg4=41542456) at
fmgr.c:1375
#7 0x00000000004d2a01 in index_bulk_delete (info=0x7fff2c765cd0,
stats=0x0, callback=0x668f6d <lazy_tid_reaped>, callback_state=0x279e338)
at indexam.c:690
#8 0x000000000066861d in lazy_vacuum_index (indrel=0x7fd40ab846f0,
stats=0x279e770, vacrelstats=0x279e338) at vacuumlazy.c:1367
#9 0x00000000006678a8 in lazy_scan_heap (onerel=0x274ec90,
vacrelstats=0x279e338, Irel=0x279e790, nindexes=2, scan_all=0 '\000') at
vacuumlazy.c:1098
#10 0x00000000006660f7 in lazy_vacuum_rel (onerel=0x274ec90, options=99,
params=0x27bdc88, bstrategy=0x27bdd18) at vacuumlazy.c:244
#11 0x0000000000665c1a in vacuum_rel (relid=1249, relation=0x7fff2c7662a0,
options=99, params=0x27bdc88) at vacuum.c:1388
#12 0x00000000006643ce in vacuum (options=99, relation=0x7fff2c7662a0,
relid=1249, params=0x27bdc88, va_cols=0x0, bstrategy=0x27bdd18,
isTopLevel=1 '\001')
at vacuum.c:293
#13 0x000000000075d23c in autovacuum_do_vac_analyze (tab=0x27bdc80,
bstrategy=0x27bdd18) at autovacuum.c:2807
#14 0x000000000075c632 in do_autovacuum () at autovacuum.c:2328
#15 0x000000000075b457 in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1647
#16 0x000000000075b0a5 in StartAutoVacWorker () at autovacuum.c:1454
#17 0x000000000076f9cc in StartAutovacuumWorker () at postmaster.c:5261
#18 0x000000000076f28a in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:4918
#19 <signal handler called>
#20 0x00000039696e1353 in __select_nocancel () from /lib64/libc.so.6
#21 0x000000000076ace7 in ServerLoop () at postmaster.c:1582
#22 0x000000000076a538 in PostmasterMain (argc=3, argv=0x26f9330) at
postmaster.c:1263
#23 0x00000000006c1c2e in main (argc=3, argv=0x26f9330) at main.c:223

Thanks,
Amit

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Piotr Stefaniak 2015-07-27 05:33:58 Re: spgist recovery assertion failure
Previous Message Noah Misch 2015-07-27 05:29:41 Re: spgist recovery assertion failure