Re: xid wraparound danger due to INDEX_CLEANUP false

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false
Date: 2020-06-25 13:59:08
Message-ID: CA+fd4k6Sau=izQgXzNPdUX=HD6VBeAsevAf7vYPAys689AA2DA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 25 Jun 2020 at 05:02, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Jun 24, 2020 at 10:21 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Sorry, I'm so far behind on my email. Argh.
>
> That's okay.
>
> > I think, especially on the blog post you linked, that we should aim to
> > have INDEX_CLEANUP OFF mode do the minimum possible amount of work
> > while still keeping us safe against transaction ID wraparound. So if,
> > for example, it's desirable but not imperative for BRIN to
> > resummarize, then it's OK normally but should be skipped with
> > INDEX_CLEANUP OFF.
>
> I agree that that's very important.
>
> > Apparently, what we're really doing here is that even when
> > INDEX_CLEANUP is OFF, we're still going to keep all the dead tuples.
> > That seems sad, but if it's what we have to do then it at least needs
> > comments explaining it.
>
> +1. Though I think that AMs should technically have the right to
> consider it advisory.
>
> > As for the btree portion of the change, I expect you understand this
> > better than I do, so I'm reluctant to stick my neck out, but it seems
> > that what the patch does is force index cleanup to happen even when
> > INDEX_CLEANUP is OFF provided that the vacuum is for wraparound and
> > the btree index has at least 1 recyclable page. My first reaction is
> > to wonder whether that doesn't nerf this feature into oblivion. Isn't
> > it likely that an index that is being vacuumed for wraparound will
> > have a recyclable page someplace? If the presence of that 1 recyclable
> > page causes the "help, I'm about to run out of XIDs, please do the
> > least possible work" flag to become a no-op, I don't think users are
> > going to be too happy with that. Maybe I am misunderstanding.
>
> I have mixed feelings about it myself. These are valid concerns.
>
> This is a problem to the extent that the user has a table that they'd
> like to use INDEX_CLEANUP with, that has indexes that regularly
> require cleanup due to page deletion. ISTM, then, that the really
> relevant high level design questions for this patch are:
>
> 1. How often is that likely to happen in The Real World™?
>
> 2. If we fail to do cleanup and leak already-deleted pages, how bad is
> that? ( Both in general, and in the worst case.)
>
> I'll hazard a guess for 1: I think that it might not come up that
> often. Page deletion is often something that we hardly ever need. And,
> unlike some DB systems, we only do it when pages are fully empty
> (which, as it turns out, isn't necessarily better than our simple
> approach [1]). I tend to think it's unlikely to happen in cases where
> INDEX_CLEANUP is used, because those are cases that also must not have
> that much index churn to begin with.
>
> Then there's question 2. The intuition behind the approach from
> Sawada-san's patch was that allowing wraparound here feels wrong -- it
> should be in the AM's hands. However, it's not like I can point to
> some ironclad guarantee about not leaking deleted pages that existed
> before the INDEX_CLEANUP feature. We know that the FSM is not crash
> safe, and that's that. Is this really all that different? Maybe it is,
> but it seems like a quantitative difference to me.

I think that with the approach implemented in my patch, it could be a
problem for the user that the user cannot easily know in advance
whether vacuum with INDEX_CLEANUP false will perform index cleanup,
even if page deletion doesn’t happen in most cases. They can check the
vacuum will be a wraparound vacuum but it’s relatively hard for users
to check in advance there are recyclable pages in the btree index.
This will be a restriction for users, especially those who want to use
INDEX_CLEANUP feature to speed up an impending XID wraparound vacuum
described on the blog post that Peter shared.

I don’t come up with a good solution to keep us safe against XID
wraparound yet but it seems to me that it’s better to have an option
that forces index cleanup not to happen.

>
> I'm kind of arguing against myself even as I try to advance my
> original argument. If workloads that use INDEX_CLEANUP don't need to
> delete and recycle pages in any case, then why should we care that
> those same workloads might leak pages on account of the wraparound
> hazard?

I had the same impression at first.

> The real reason that I want to push the mechanism down into index
> access methods is because that design is clearly better overall; it
> just so happens that the specific way in which we currently defer
> recycling in nbtree makes very little sense, so it's harder to see the
> big picture. The xid-cleanup design that we have was approximately the
> easiest way to do it, so that's what we got. We should figure out a
> way to recycle the pages at something close to the earliest possible
> opportunity, without having to perform a full scan on the index
> relation within btvacuumscan().

+1

> Maybe we can use the autovacuum work
> item mechanism for that. For indexes that get VACUUMed once a week on
> average, it makes zero sense to wait another week to recycle the pages
> that get deleted, in a staggered fashion. It should be possible to
> recycle the pages a minute or two after VACUUM proper finishes, with
> extra work that's proportionate to the number of deleted pages. This
> is still conservative. I am currently very busy with an unrelated
> B-Tree prototype, so I might not get around to it this year. Maybe
> Sawada-san can think about this?

I thought that btbulkdelete and/or btvacuumcleanup can register an
autovacuum work item to recycle the page that gets deleted but it
might not able to recycle those pages enough because the autovacuum
work items could be taken just after vacuum. And if page deletion is
relatively a rare case in practice, we might be able to take an
optimistic approach that vacuum registers deleted pages to FSM on the
deletion and a process who takes a free page checks if the page is
really recyclable. Anyway, I’ll try to think more about this.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2020-06-25 14:00:03 Re: Why forbid "INSERT INTO t () VALUES ();"
Previous Message Dilip Kumar 2020-06-25 13:40:45 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions