Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date: 2022-08-17 12:25:06
Message-ID: CA+TgmoYMW0LgwfQG-t8XbWwN1gnFKCMqV-TJ2T0exfH=R2wiQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 16, 2022 at 8:38 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I don't think we can defend against lwlock deadlocks where somebody doesn't
> follow the AM's deadlock avoidance strategy.

That's a good way of putting it. Tom seems to be postulating that
maybe someone can use random tools that exist to take buffer locks and
pins in arbitrary order, and if that is true then you can make any AM
deadlock. I think it isn't true, though, and I think if it were true
the right fix would be to remove the tools that are letting people do
that.

There's also zero evidence that this was ever intended as a deadlock
avoidance maneuver. I think that we are only hypothesizing that it was
intended that way because the code looks weird. But I think the email
discussion shows that I thought it was wrong at the time it was
committed, and just missed the fact that the final version of the
patch hadn't fixed it. And if it *were* a deadlock avoidance maneuver
it would still be pretty broken, because it would make the startup
process error out and the whole system go down.

Regarding the question of whether we need a cleanup lock on the new
bucket I am not really seeing the advantage of going down that path.
Simply fixing this code to take a cleanup lock instead of hoping that
it always gets one by accident is low risk and should fix the observed
problem. Getting rid of the cleanup lock will be more invasive and I'd
like to see some evidence that it's a necessary step before we take
the risk of breaking things.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2022-08-17 12:28:02 Re: Amcheck verification of GiST and GIN
Previous Message Bruce Momjian 2022-08-17 11:39:20 Re: Regarding availability of 32bit client drivers for postgresql 13/14