Re: Concurrency bug in amcheck

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Concurrency bug in amcheck
Date: 2020-05-11 12:56:03
Message-ID: CAPpHfdv2qLS7ApVKTzshnrZi5qULpPp2a1f4PWkEZMUCiVf5Ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 28, 2020 at 4:05 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Mon, Apr 27, 2020 at 4:17 AM Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > > Assuming it doesn't seem we actually need any items on deleted pages,
> > > I can propose to delete them on primary as well. That would make
> > > contents of primary and standby more consistent. What do you think?
> >
> > So, my proposal is following. Backpatch my fix upthread to 11. In
> > master additionally make _bt_unlink_halfdead_page() remove page items
> > on primary as well. Do you have any objections?
>
> What I meant was that we might as well review the behavior of
> _bt_unlink_halfdead_page() here, since we have to change it anyway.
> But I think you're right: No matter what happens or doesn't happen to
> _bt_unlink_halfdead_page(), the fact is that deleted pages may or may
> not have a single remaining item (which was originally the "top
> parent" item from the page at offset number P_HIKEY), now and forever.
> We have to conservatively assume that it could be either state, now
> and forever. That means that we definitely have to give up on the
> check, per your patch. So, +1 for backpatching that back to 11.

Thank you. I've worked a bit on comments and commit message. I would
appreciate you review.

> (BTW, I don't think that this is a concurrency issue, except in the
> sense that a test case that recreates the false positive is sensitive
> to concurrency -- I imagine you agree with this.)

Yes, I agree it's related to concurrency, but not exactly concurrency issue.

> I like your idea of making the primary consistent with the REDO
> routine on the master branch only. I wonder if that will make it
> possible to change btree_mask() so that wal_consistency_checking can
> check deleted pages as well. The contents of a deleted page's special
> area do matter, and yet we don't currently verify that it matches (we
> use mask_page_content() within btree_mask() for deleted pages, which
> seems inappropriately broad). In particular, the left and right
> sibling links should be consistent with the primary on a deleted page.

Thank you. 2nd patch is proposed for master and makes btree page
unlink remove all the items from the page being deleted.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Fix-amcheck-for-page-checks-concurrent-to-replay-of-.patch application/octet-stream 2.7 KB
0002-Remove-btree-page-items-after-page-unlink.patch application/octet-stream 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2020-05-11 12:58:15 Re: making update/delete of inheritance trees scale better
Previous Message Julien Rouhaud 2020-05-11 12:53:59 Re: Add "-Wimplicit-fallthrough" to default flags (was Re: pgsql: Support FETCH FIRST WITH TIES)