Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: chenhj <chjischj(at)163(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)
Date: 2019-09-29 16:27:28
Message-ID: CAPpHfdtqWjiF5W7AE-qdn1=9aGSL6kB4UfZVOjqTkCsZi=d6Ew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 29, 2019 at 6:12 PM Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> On Sun, Sep 29, 2019 at 5:38 PM Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > On Sun, Sep 29, 2019 at 11:17 AM chenhj <chjischj(at)163(dot)com> wrote:
> > > Does the locking order of autovacuum process(root->right->left) correct? While insert process lock gin buffer by order of bottom->top and left->right.
> > >
> > > 1. vacuum(root->right->left):
> >
> > Starting from root seems OK for me, because vacuum blocks all
> > concurrent inserts before doing this. But this needs to be properly
> > documented in readme.
> >
> > Locking from right to left is clearly wrong. It could deadlock with
> > concurrent ginStepRight(), which locks from left to right. I expect
> > this happened in your case. I'm going to reproduce this and fix.
>
> I just managed to reproduce this using two sessions on master branch.
>
> session 1
> session 2
>
> # create table test with (autovacuum_enabled = false) as (select
> array[1] ar from generate_series(1,20000) i);
> # create index test_ar_idx on test using gin (ar);
> # vacuum analyze test;
> # delete from test;
>
> # set enable_seqscan = off;
> gdb> b ginbtree.c:150
> # select * from test where ar @> '{1}'::integer[];
> Step in gdb just before ReadBuffer() in ReleaseAndReadBuffer().
>
> gdb> b ginvacuum.c:155
> # vacuum test;
>
> gdb > continue
> gdb> continue

Patch with fix is attached. Idea is simple: ginScanToDelete() now
keeps exclusive lock on left page eliminating the need to relock it.
So, we preserve left-to-right locking order and can't deadlock with
ginStepRight().

Also, we need to adjust Concurrency section in GIN README. For me the
description looks vague and inconsistent even with current behavior.
I'm going to post this later.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
gin_ginDeletePage_ginStepRight_deadlock_fix-1.patch application/octet-stream 4.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2019-09-29 16:36:55 Re: Standby accepts recovery_target_timeline setting?
Previous Message Fujii Masao 2019-09-29 15:49:03 recovery_min_apply_delay in archive recovery causes assertion failure in latch