From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock |
Date: | 2022-10-17 17:02:10 |
Message-ID: | 20221017170210.tdbglb274y4rsiv2@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2022-10-17 10:43:16 -0400, Robert Haas wrote:
> On Fri, Oct 14, 2022 at 2:21 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2022-10-14 10:40:11 +0530, Amit Kapila wrote:
> > > On Fri, Oct 14, 2022 at 2:25 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > Fair point. How about something like: "XXX Do we really need to check
> > > for cleanup lock on the new bucket? Here, we initialize the page, so
> > > ideally we don't need to perform any operation that requires such a
> > > check."?.
> >
> > This still seems to omit that the code is quite broken.
>
> I don't think it's the job of a commit which is trying to fix a
> certain bug to document all the other bugs it isn't fixing.
That's true in general, but the case of fixing a bug in one place but not in
another nearby is a different story.
> If that were the standard, we'd never be able to commit any bug fixes
<eyeroll/>
> which is exactly what's happening on this thread.
What's been happening from my POV is that Amit and you didn't even acknowledge
the broken cleanup lock logic for weeks. I don't mind a reasoned decision to
not care about the non-recovery case. But until now I've not seen that, but I
also have a hard time keeping up with email, so I might have missed it.
> > > Feel free to suggest something better.
> >
> > How about something like:
> >
> > XXX: This code is wrong, we're overwriting the buffer before "acquiring" the
> > cleanup lock. Currently this is not known to have bad consequences because
> > XYZ and the fix seems a bit too risky for the backbranches.
>
> I think here you're talking about the code that runs in
> normal-running, not recovery, in _hash_expandtable, where we call
> _hash_getnewbuf and then IsBufferCleanupOK.
Yes.
> That code indeed seems stupid, because as you say, there's no point in
> calling _hash_getnewbuf() and thus overwriting the buffer and then only
> afterwards checking IsBufferCleanupOK. By then the die is cast. But at the
> same time, I think that it's not wrong in any way that matters to the best
> of our current knowledge. That new buffer that we just went and got might be
> pinned by somebody else, but they can't be doing anything interesting with
> it, because we wouldn't be allocating it as a new page if it were already in
> use for anything, and only one process is allowed to be doing such an
> allocation at a time. That both means that we can likely remove the cleanup
> lock acquisition, but it also means that if we don't, there is no
> correctness problem here, strictly speaking.
If that's the case cool - I just don't know the locking protocol of hash
indexes well enough to judge this.
> So I would suggest that if we feel we absolutely must put a comment
> here, we could make it say something like "XXX. It doesn't make sense
> to call _hash_getnewbuf() first, zeroing the buffer, and then only
> afterwards check whether we have a cleanup lock. However, since no
> scan can be accessing the new buffer yet, any concurrent accesses will
> just be from processes like the bgwriter or checkpointer which don't
> care about its contents, so it doesn't really matter."
WFM. I'd probably lean to just fixing in the backbranches instead, but as long
as we make a conscious decision...
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2022-10-17 17:34:02 | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock |
Previous Message | Andrew Dunstan | 2022-10-17 14:59:32 | Re: ssl tests aren't concurrency safe due to get_free_port() |