Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: chenhj <chjischj(at)163(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)
Date: 2019-10-03 21:05:43
Message-ID: CAPpHfdvMvsw-NcE5bRS7R1BbvA4BxoDnVVjkXC5W0Czvy9LVrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 1, 2019 at 5:55 AM Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> On Mon, Sep 30, 2019 at 10:54 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> >
> > On Sun, Sep 29, 2019 at 8:12 AM Alexander Korotkov
> > <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > > I just managed to reproduce this using two sessions on master branch.
> > >
> > > session 1
> > > session 2
> >
> > Was the involvement of the pending list stuff in Chen's example just a
> > coincidence? Can you recreate the problem while eliminating that
> > factor (i.e. while setting fastupdate to off)?
> >
> > Chen's example involved an INSERT that deadlocked against VACUUM --
> > not a SELECT. Is this just a coincidence?
>
> Chen wrote.
>
> > Unfortunately the insert process(run by gcore) held no lwlock, it should be another process(we did not fetch core file) that hold the lwlock needed for autovacuum process.
>
> So, he catched backtrace for INSERT and post it for information. But
> since INSERT has no lwlocks held, it couldn't participate deadlock.
> It was just side waiter.
>
> I've rerun my reproduction case and it still deadlocks. Just the same
> steps but GIN index with (fastupdate = off).

BTW, while trying to revise README I found another bug. It appears to
be possible to reach deleted page from downlink. The reproduction
case is following.

session 1
session 2

# create table tmp (ar int[]) with (autovacuum_enabled = false);
# insert into tmp (select '{1}' from generate_series(1,10000000) i);
# insert into tmp values ('{1,2}');
# insert into tmp (select '{1}' from generate_series(1,10000000) i);
# create index tmp_idx on tmp using gin(ar);

# delete from tmp;

# set max_parallel_workers_per_gather = 0;
/* Breakpoint where entyLoadMoreItems() calls ginFindLeafPage() to
search GIN posting tree */
gdb> b ginget.c:682
gdb> select * from tmp where ar @> '{1,2}';
gdb> /* step till ReleaseAndReadBuffer() releases a buffer */

# vacuum tmp;

# continue

It also appears that previous version of deadlock fix didn't supply
left sibling to leftmost child of any page. As result, internal pages
were never deleted. The first attached patch is revised fix is
attached.

The second patch fix traversing to deleted page using downlink.
Similarly to nbtree, we just always move right if landed on deleted
page. Also, it appears that we clear all other flags while marking
page as deleted. That cause assert to fire. With patch, we just add
deleted flag without erasing others. Also, I have to remove assert
that ginStepRight() never steps to deleted page. If we landed to
deleted page from downlink, then we can find other deleted page by
rightlink.

I'm planning to continue work on README, comments and commit messages.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-gin_ginDeletePage_ginStepRight_deadlock_fix-2.patch application/octet-stream 10.5 KB
0002-gin-fix-traversing-to-deleted-page-by-downlink-2.patch application/octet-stream 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-03 21:38:25 Re: allocation limit for encoding conversion
Previous Message David Fetter 2019-10-03 20:55:18 Re: Value of Transparent Data Encryption (TDE)