Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: chenhj <chjischj(at)163(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)
Date: 2019-11-10 23:42:42
Message-ID: CAPpHfdvLVvO8FSCenRG3JVSLrMpaMpWTvJD3ueuC3EUdfE4mXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

I'm sorry for late reply. I was busy with various things. Also
digging into these details took some time. Please find my explanation
below.

On Wed, Oct 30, 2019 at 2:34 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> In general, it seems very important to be clear about exactly how the
> key space works. The nbtree work for v12 greatly benefitted from
> defining comparisons in a way that didn't really change how nbtree
> worked, while at the same time minimizing I/O and making nbtree
> faithful to Lehman & Yao's original design. It isn't obvious how
> valuable it is to really carefully define how invariants and key
> comparisons work, but it seems possible to solve a lot of problems
> that way.

gin packs downlinks and pivot key into tuples in the different way
than nbtree does. Let's see the logical structure of internal B-tree
page. We can represent it as following.

downlink_1, key_1_2, downlink_2, key_2_3, .... , downlink_n, highkey

key_{i-1}_i is pivot key, which splits key space between
downlink_{i-1} and downlink_i. So, every key under downlink_{i-1} is
<= key_{i-1}_i. Every key under downlink_i is > key_{i-1}_i. And all
underlying keys are <= highkey.

nbtree packs that into tuples as followings (tuples are shown in parentheses).

(highkey), (-inf, downlink_1), (key_1_2, downlink_2), ...
(key_{n-1}_n, downlink_n)

So, we store highkey separately. key_{i-1}_i and downlink_i form a
tuple together. downlink_1 is coupled with -inf key.

gin packs tuples in a different way.

(downlink_1, key_1_2), (downlink_2, key_2_3), ... , (downlink_n, highkey)

So, in GIN key_{i-1}_i and downlink_{i-1} form a tuple. Highkey is
coupled with downlink_n. And -inf key is not needed here.

But there is couple notes about highkey:
1) In entry tree rightmost page, a key coupled with downlink_n doesn't
really matter. Highkey is assumed to be infinity.
2) In posting tree, a key coupled with downlink_n always doesn't
matter. Highkey for non-rightmost pages is stored separately and
accessed via GinDataPageGetRightBound().

I think this explains following:
1) The invariants in gin btree are same as they are in nbtree. Just
page layout is different.
2) The way entryLocateEntry() works. In particular why it's OK to go
the mid downlink if corresponding key equals.
3) There is no "negative infinity" item in internal pages.

Does the explanation of above looks clear for you? If so, I can put
it into the README.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-11-11 00:02:01 Re: [bug fix] Produce a crash dump before main() on Windows
Previous Message Tom Lane 2019-11-10 23:06:17 Re: pg_upgrade fails to detect unsupported arrays and ranges