Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date: 2022-08-17 18:36:23
Message-ID: 20220817183623.w3fsoerpaunt7exe@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-08-17 10:18:14 +0530, Amit Kapila wrote:
> > Looking at the non-recovery code makes me even more suspicious:
> >
> > /*
> > * Physically allocate the new bucket's primary page. We want to do this
> > * before changing the metapage's mapping info, in case we can't get the
> > * disk space. Ideally, we don't need to check for cleanup lock on new
> > * bucket as no other backend could find this bucket unless meta page is
> > * updated. However, it is good to be consistent with old bucket locking.
> > */
> > buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
> > if (!IsBufferCleanupOK(buf_nblkno))
> > {
> > _hash_relbuf(rel, buf_oblkno);
> > _hash_relbuf(rel, buf_nblkno);
> > goto fail;
> > }
> >
> >
> > _hash_getnewbuf() calls _hash_pageinit() which calls PageInit(), which
> > memset(0)s the whole page. What does it even mean to check whether you
> > effectively have a cleanup lock after you zeroed out the page?
> >
> > Reading the README and the comment above makes me wonder if this whole cleanup
> > lock business here is just cargo culting and could be dropped?
> >
>
> I think it is okay to not acquire a clean-up lock on the new bucket
> page both in recovery and non-recovery paths. It is primarily required
> on the old bucket page to avoid concurrent scans/inserts. As mentioned
> in the comments and as per my memory serves, it is mainly for keeping
> it consistent with old bucket locking.

It's not keeping it consistent with bucket locking to zero out a page before
getting a cleanup lock, hopefully at least. This code is just broken on
multiple fronts, and consistency isn't a defense.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-08-17 18:45:34 Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Previous Message Alvaro Herrera 2022-08-17 18:24:30 Re: cataloguing NOT NULL constraints