Re: Fixes for two separate bugs in nbtree VACUUM's page deletion

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Fixes for two separate bugs in nbtree VACUUM's page deletion
Date: 2020-04-30 06:38:04
Message-ID: CA+fd4k5Z7-JToLXMLMQpx6UBaC_XkTM729JBF0BS5Fequf0rwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 29 Apr 2020 at 01:17, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Tue, Apr 28, 2020 at 12:21 AM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > I agree with both patches.
>
> Thanks for the review.
>
> > For the first fix it seems better to push down the logic to the page
> > deletion code as your 0001 patch does so. The following change changes
> > the page deletion code so that it emits LOG message indicating the
> > index corruption when a deleted page is passed. But we used to ignore
> > in this case.
>
> > I agree with this change but since I'm not sure this change directly
> > relates to this bug I wonder if it can be a separate patch.
>
> I am not surprised that you are concerned about this. That was the
> hardest decision I had to make when writing the patch.
>
> The central idea in the 0001-* patch (as I say at the end of the
> commit message) is that we have a strict rule about where we handle
> maintaining the oldest btpo.xact for already-deleted pages, and where
> we handle it for pages that become deleted. The rules is this:
> btvacuumpage() is strictly responsible for the former category (as
> well as totally live pages), while _bt_pagedel() is responsible for
> the latter category (this includes pages that started out as half-dead
> but become deleted).
>
> In order for this to work, _bt_pagedel() must not ever see a deleted
> page that it did not delete itself. If I didn't explicitly take a
> strict position on this, then I suppose that I'd have to have similar
> handling for maintaining the oldest btpo.xact for existing deleted
> pages encountered within _bt_pagedel(). But that doesn't make any
> sense, even with corruption, so I took a strict position. Even with
> corruption, how could we encounter an *existing* deleted page in
> _bt_pagedel() but not encounter the same page within some call to
> btvacuumpage() made from btvacuumscan()? In other words, how can we
> fail to maintain the oldest btpo.xact for deleted pages even if we
> assume that the index has a corrupt page with sibling links that point
> to a fully deleted page? It seems impossible, even in this extreme,
> contrived scenario.
>
> Then there is the separate question of the new log message about
> corruption. It's not too hard to see why _bt_pagedel() cannot ever
> access an existing deleted page unless there is corruption:
>
> * _bt_pagedel()'s only caller (the btvacuumpage() caller) will simply
> never pass a deleted page -- it handles those directly.
>
> * _bt_pagedel() can only access additional leaf pages to consider
> deleting them by following the right sibling link of the
> btvacuumpage() caller's leaf page (or possibly some page even further
> to the right if it ends up deleting many leaf pages in one go).
> Following right links and finding a deleted page can only happen when
> there is a concurrent page deletion by VACUUM, since the sibling links
> to the deleted page are changed by the second stage of deletion (i.e.
> by the atomic actionat where the page is actually marked deleted,
> within _bt_unlink_halfdead_page()).
>
> * There cannot be a concurrent VACUUM, though, since _bt_pagedel()
> runs in VACUUM. (Note that we already depend on this for correctness
> in _bt_unlink_halfdead_page(), which has a comment that says "a
> half-dead page cannot resurrect because there can be only one vacuum
> process running at a time".)
>
> The strict position seems justified. It really isn't that strict. I'm
> not concerned about the new LOG message concerning corruption being
> annoying or wrong

Thank you for the explanation. This makes sense to me. We've ever been
able to regard it as an index corruption that an existing deleted page
is detected within _bt_pagedel(). But with changes required for fixing
this bug, the responsibilities will change so that _bt_pagedel() must
not see a deleted page. Therefore we can take a more strict position.
It's not that since this bug could lead an index corruption we will
start warning it.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-04-30 06:56:00 Re: Autovacuum on partitioned table (autoanalyze)
Previous Message David Zhang 2020-04-30 06:26:16 Re: WIP/PoC for parallel backup