Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, chenhj <chjischj(at)163(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
Date: 2018-12-08 09:48:06
Message-ID: 5849302B-3D98-4C09-BD43-277D0E051D2B@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Thanks, Alexander!
> 8 дек. 2018 г., в 6:54, Alexander Korotkov <aekorotkov(at)gmail(dot)com> написал(а):
>
> Yep, please find attached draft patch.

Patch seems good to me, I'll check it in more detail.
The patch gets posting item at FirstOffsetNumber instead of btree->getLeftMostChild(). This seem OK, since dataGetLeftMostPage() is doing just the same, but with few Assert()s.

> BTW, it seems that I've another bug in GIN. README says that
>
> "However, posting trees are only
> fully searched from left to right, starting from the leftmost leaf. (The
> tree-structure is only needed by insertions, to quickly find the correct
> insert location). So as long as we don't delete the leftmost page on each
> level, a search can never follow a downlink to page that's about to be
> deleted."
>
> But that's not really true once we teach GIN to skip parts of posting
> trees in PostgreSQL 9.4. So, there might be a risk to visit page to
> be deleted using downlink. But in order to get real problem, vacuum
> should past cleanup stage and someone else should reuse this page,
> before we traverse downlink. Thus, the risk of real problem is very
> low. But it still should be addressed.

There's a patch above in this thread 0001-Use-correct-locking-protocol-during-GIN-posting-tree.patch where I propose stamping every deleted page with GinPageSetDeleteXid(page, ReadNewTransactionId()); and avoid reusing the page before TransactionIdPrecedes(GinPageGetDeleteXid(page), RecentGlobalDataXmin).
Should we leave alone this bug for future fixes to keep current fix noninvasive?

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2018-12-08 10:16:36 Re: extended query protcol violation?
Previous Message Vladimir Sitnikov 2018-12-08 08:57:02 Re: extended query protcol violation?