Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date: 2022-10-12 10:46:19
Message-ID: CALDaNm3aL=CYtP+H+4Xka9TDmALjTbpTXTV7CSbH8HU_zGfUQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 6 Oct 2022 at 12:44, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, Oct 1, 2022 at 12:35 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > This issue does occasionally happen in CI, as e.g. noted in this thread:
> > https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com
> >
> > On 2022-08-18 15:17:47 +0530, Amit Kapila wrote:
> > > I agree with you that getting rid of the clean-up lock on the new
> > > bucket is a more invasive patch and should be done separately if
> > > required. Yesterday, I have done a brief analysis and I think that is
> > > possible but it doesn't seem to be a good idea to backpatch it.
> >
> > My problem with this approach is that the whole cleanup lock is hugely
> > misleading as-is. As I noted in
> > https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de
> > we take the cleanup lock *after* re-initializing the page. Thereby
> > completely breaking the properties that a cleanup lock normally tries to
> > guarantee.
> >
> > Even if that were to achieve something useful (doubtful in this case),
> > it'd need a huge comment explaining what's going on.
> >
>
> Attached are two patches. The first patch is what Robert has proposed
> with some changes in comments to emphasize the fact that cleanup lock
> on the new bucket is just to be consistent with the old bucket page
> locking as we are initializing it just before checking for cleanup
> lock. In the second patch, I removed the acquisition of cleanup lock
> on the new bucket page and changed the comments/README accordingly.
>
> I think we can backpatch the first patch and the second patch can be
> just a HEAD-only patch. Does that sound reasonable to you?

Thanks for the patches.
I have verified that the issue is fixed using a manual test upto
REL_10_STABLE version and found it to be working fine.

I have added code to print the old buffer and new buffer values when
both old buffer and new buffer will get dirtied. Then I had executed
the following test and note down the old buffer and new buffer value
from the log file:
CREATE TABLE pvactst (i INT, a INT[], p POINT) with (autovacuum_enabled = off);
CREATE INDEX hash_pvactst ON pvactst USING hash (i);
create table t1(c1 int);
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;

In my environment, the issue will occur when oldbuf is 38 and newbuf is 60.

Once we know the old buffer and new buffer values, we will have to
debug the checkpointer and recovery process to simulate the scenario.
I used the following steps to simulate the issue in my environment:
1) Create streaming replication setup with the following configurations:
wal_consistency_checking = all
shared_buffers = 128MB # min 128kB
bgwriter_lru_maxpages = 0 # max buffers written/round, 0 disables
checkpoint_timeout = 30s # range 30s-1d
2) Execute the following in master node:
CREATE TABLE pvactst (i INT, a INT[], p POINT) with (autovacuum_enabled = off);
CREATE INDEX hash_pvactst ON pvactst USING hash (i);
3) Hold checkpointer process of standby instance at BufferSync while debugging.
4) Execute the following in master node:
create table t1(c1 int); -- This is required so that the old buffer
value is not dirty in checkpoint process. (If old buffer is dirty then
we will not be able to sync the new buffer as checkpointer will wait
while trying to acquire the lock on old buffer).
5) Make checkpoint process to check the buffers up to old buffer + 1.
In our case it should cross 38.
6) Hold recovery process at
hash_xlog_split_allocate_page->IsBufferCleanupOK (approximately line
hash_xlog.c:357) while executing the following for the last insert in
the master node:
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;
INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,1000) i;
7) Continue the checkpointer process and make it proceed to
SyncOneBuffer with buf_id = 60(newbuf value that was noted from the
earlier execution) and let it proceed up to PinBuffer_Locked(bufHdr);
8) Continue the recovery process will reproduce the PANIC scenario.

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-10-12 10:52:37 Re: create subscription - improved warning message
Previous Message Peter Smith 2022-10-12 10:10:47 Re: Perform streaming logical transactions by background workers and parallel apply