Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, chenhj <chjischj(at)163(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
Date: 2018-12-07 15:34:44
Message-ID: CAPpHfdtVAGba0s_kyB1dbpLgXNRVRQA1wUfm8iP6jNn81OENdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 7, 2018 at 12:14 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > 7 дек. 2018 г., в 2:50, Peter Geoghegan <pg(at)bowt(dot)ie> написал(а):
> > On Thu, Dec 6, 2018 at 12:51 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >> However, I'd like to note that 218f51584d5 introduces two changes:
> >> 1) Cleanup locking only if there pages to delete
> >> 2) Cleanup locking only subtree root
> >> The 2nd one is broken. But the 1st one seems still good for me and
> >> useful, because in vast majority of cases vacuum doesn't delete any
> >> index pages. So, I propose to revert 218f51584d5, but leave there
> >> logic, which locks root for cleanup only once there are pages to
> >> delete. Any thoughts?
> >
> > Can you post a patch that just removes the 2nd part, leaving the
> > still-correct first part?
>
> I like the idea of keeping cleanup lock only if there are pages to delete. It will still solve the original problem that caused proposals for GIN VACUUM enhancements.
>
> But I must note that there is one more problem: ginVacuumPostingTreeLeaves() do not ensure that all splitted pages are visited. It copies downlink block numbers to a temp array and then unlocks parent. It is not correct way to scan posting tree for cleanup.

Hmm... In ginVacuumPostingTreeLeaves() we visit pages from parent to
children. This is why splitted pages might be unvisited. Should we
visit leafpages by rightlinks instead? Then, if we have candidates
for delete, ginScanToDelete() can work the same as before 218f51584d5.

> PFA diff with following changes:
> 1. Always take root cleanup lock before deleting pages
> 2. Check for concurrent splits after scanning page
>
> Please note, that neither applying this diff nor reverting 218f51584d5 will solve bug of page delete redo lock on standby.

ginRedoDeletePage() visits pages in following order: right(to be
deleted), parent, left. I didn't find any description of why it has
been done so. Any guesses don't we visit parent page first here?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-12-07 15:48:23 Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Previous Message John Naylor 2018-12-07 13:55:57 Re: WIP: Avoid creation of the free space map for small tables