Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: chjischj(at)163(dot)com, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
Date: 2018-11-08 01:46:52
Message-ID: CAH2-WzkPKbY3+R+uXwhMYRm7DY2OGBjtN7UJo+EOF0y7uQH13g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 29, 2018 at 12:04 PM chenhj <chjischj(at)163(dot)com> wrote:
> ## stack of autovacuum:Acquire lock 0x2aaab670dfe4 and hold 0x2aaab4009564
> --------------------------------------
> (gdb) bt
> #0 0x00007fe11552379b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
> #1 0x00007fe11552382f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
> #2 0x00007fe1155238cb in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> #3 0x000000000069d362 in PGSemaphoreLock (sema=0x2aaaaac07eb8) at pg_sema.c:310
> #4 0x00000000007095ac in LWLockAcquire (lock=0x2aaab670dfe4, mode=LW_EXCLUSIVE) at lwlock.c:1233
> #5 0x00000000004947ef in ginScanToDelete (gvs=gvs(at)entry=0x7ffd81e4d0f0, blkno=701, isRoot=isRoot(at)entry=0 '\000', parent=parent(at)entry=0x7ffd81e4c790, myoff=myoff(at)entry=37) at ginvacuum.c:262
> #6 0x0000000000494874 in ginScanToDelete (gvs=gvs(at)entry=0x7ffd81e4d0f0, blkno=blkno(at)entry=9954, isRoot=isRoot(at)entry=1 '\001', parent=parent(at)entry=0x7ffd81e4c790, myoff=myoff(at)entry=0)
> at ginvacuum.c:277
> #7 0x0000000000494ed1 in ginVacuumPostingTreeLeaves (gvs=gvs(at)entry=0x7ffd81e4d0f0, blkno=9954, isRoot=isRoot(at)entry=0 '\000') at ginvacuum.c:404
> #8 0x0000000000494e21 in ginVacuumPostingTreeLeaves (gvs=gvs(at)entry=0x7ffd81e4d0f0, blkno=644, isRoot=isRoot(at)entry=1 '\001') at ginvacuum.c:372
> #9 0x0000000000495540 in ginVacuumPostingTree (rootBlkno=<optimized out>, gvs=0x7ffd81e4d0f0) at ginvacuum.c:426
> #10 ginbulkdelete (info=0x7ffd81e4f720, stats=<optimized out>, callback=<optimized out>, callback_state=<optimized out>) at ginvacuum.c:649
> #11 0x00000000005e1194 in lazy_vacuum_index (indrel=0x3146e28, stats=stats(at)entry=0x28ec200, vacrelstats=vacrelstats(at)entry=0x28ebc28) at vacuumlazy.c:1621
> #12 0x00000000005e214d in lazy_scan_heap (aggressive=<optimized out>, nindexes=<optimized out>, Irel=<optimized out>, vacrelstats=<optimized out>, options=16, onerel=0x28ec1f8)
> at vacuumlazy.c:1311
> #13 lazy_vacuum_rel (onerel=onerel(at)entry=0x3144fa8, options=options(at)entry=99, params=params(at)entry=0x289f270, bstrategy=<optimized out>) at vacuumlazy.c:258

Actually, the bigger problem is on this side of the deadlock, within
VACUUM. ginInsertCleanup() (the first/other side of the deadlock) may
have problems, but this seems worse. Commit 218f51584d5 appears to be
at fault here.

First things first: ginScanToDelete() *maintains* buffer locks on
multiple levels of a posting tree, meaning that there may be cases
where quite a few exclusive buffer locks may be held all at once (one
per level). MAX_SIMUL_LWLOCKS is 200 these days, and in practice a
B-Tree can never get that tall, but having the number of buffer locks
acquired be determined by the height of the tree is not okay, on
general principle. The nbtree code's page deletion goes to a lot of
effort to keep the number of buffer locks fixed, but nothing like that
is is attempted for GIN posting trees.

Chen's original analysis of the problem seems to be more or less
accurate: the order that ginScanToDelete() acquires buffer locks as it
descends the tree (following commit 218f51584d5) is not compatible
with the order within ginFinishSplit(). The faulty code within
ginScanToDelete() crabs/couples buffer locks while descending the
tree, whereas the code within ginFinishSplit() crabs them as it
ascends the same tree.

Teodor: Do you think that the issue is fixable? It looks like there
are serious issues with the design of 218f51584d5 to me. I don't think
the general "there can't be any inserters at this subtree" thing works
given that we have to couple buffer locks when moving right for other
reasons. We call ginStepRight() within ginFinishSplit(), for reasons
that date back to bug fix commit ac4ab97e from 2013 -- that detail is
probably important, because it seems to be what breaks the subtree
design (we don't delete in two phases or anything in GIN).

I think that you have to be doing a multi-level delete for a deadlock
to take place, which isn't particularly likely to coincide with a
concurrent insertion in general. That may explain why it's taken a
year to get a high-quality bug report.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-11-08 01:49:31 Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
Previous Message Imai, Yoshikazu 2018-11-08 00:34:00 RE: Small performance tweak to run-time partition pruning