Re: xid wraparound danger due to INDEX_CLEANUP false

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false
Date: 2020-06-24 20:02:20
Message-ID: CAH2-Wz=L4XFk4xzFKx4UNMAZR3meBgaEzX33sdGpX--G0LxvDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 24, 2020 at 10:21 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Sorry, I'm so far behind on my email. Argh.

That's okay.

> I think, especially on the blog post you linked, that we should aim to
> have INDEX_CLEANUP OFF mode do the minimum possible amount of work
> while still keeping us safe against transaction ID wraparound. So if,
> for example, it's desirable but not imperative for BRIN to
> resummarize, then it's OK normally but should be skipped with
> INDEX_CLEANUP OFF.

I agree that that's very important.

> Apparently, what we're really doing here is that even when
> INDEX_CLEANUP is OFF, we're still going to keep all the dead tuples.
> That seems sad, but if it's what we have to do then it at least needs
> comments explaining it.

+1. Though I think that AMs should technically have the right to
consider it advisory.

> As for the btree portion of the change, I expect you understand this
> better than I do, so I'm reluctant to stick my neck out, but it seems
> that what the patch does is force index cleanup to happen even when
> INDEX_CLEANUP is OFF provided that the vacuum is for wraparound and
> the btree index has at least 1 recyclable page. My first reaction is
> to wonder whether that doesn't nerf this feature into oblivion. Isn't
> it likely that an index that is being vacuumed for wraparound will
> have a recyclable page someplace? If the presence of that 1 recyclable
> page causes the "help, I'm about to run out of XIDs, please do the
> least possible work" flag to become a no-op, I don't think users are
> going to be too happy with that. Maybe I am misunderstanding.

I have mixed feelings about it myself. These are valid concerns.

This is a problem to the extent that the user has a table that they'd
like to use INDEX_CLEANUP with, that has indexes that regularly
require cleanup due to page deletion. ISTM, then, that the really
relevant high level design questions for this patch are:

1. How often is that likely to happen in The Real World™?

2. If we fail to do cleanup and leak already-deleted pages, how bad is
that? ( Both in general, and in the worst case.)

I'll hazard a guess for 1: I think that it might not come up that
often. Page deletion is often something that we hardly ever need. And,
unlike some DB systems, we only do it when pages are fully empty
(which, as it turns out, isn't necessarily better than our simple
approach [1]). I tend to think it's unlikely to happen in cases where
INDEX_CLEANUP is used, because those are cases that also must not have
that much index churn to begin with.

Then there's question 2. The intuition behind the approach from
Sawada-san's patch was that allowing wraparound here feels wrong -- it
should be in the AM's hands. However, it's not like I can point to
some ironclad guarantee about not leaking deleted pages that existed
before the INDEX_CLEANUP feature. We know that the FSM is not crash
safe, and that's that. Is this really all that different? Maybe it is,
but it seems like a quantitative difference to me.

I'm kind of arguing against myself even as I try to advance my
original argument. If workloads that use INDEX_CLEANUP don't need to
delete and recycle pages in any case, then why should we care that
those same workloads might leak pages on account of the wraparound
hazard? There's nothing to leak! Maybe some compromise is possible, at
least for the backbranches. Perhaps nbtree is told about the
requirements in a bit more detail than we'd imagined. It's not just an
INDEX_CLEANUP boolean. It could be an enum or something. Not sure,
though.

The real reason that I want to push the mechanism down into index
access methods is because that design is clearly better overall; it
just so happens that the specific way in which we currently defer
recycling in nbtree makes very little sense, so it's harder to see the
big picture. The xid-cleanup design that we have was approximately the
easiest way to do it, so that's what we got. We should figure out a
way to recycle the pages at something close to the earliest possible
opportunity, without having to perform a full scan on the index
relation within btvacuumscan(). Maybe we can use the autovacuum work
item mechanism for that. For indexes that get VACUUMed once a week on
average, it makes zero sense to wait another week to recycle the pages
that get deleted, in a staggered fashion. It should be possible to
recycle the pages a minute or two after VACUUM proper finishes, with
extra work that's proportionate to the number of deleted pages. This
is still conservative. I am currently very busy with an unrelated
B-Tree prototype, so I might not get around to it this year. Maybe
Sawada-san can think about this?

Also, handling of the vacuum_cleanup_index_scale_factor stuff (added
to Postgres 11 by commit 857f9c36) should live next to code for the
confusingly-similar INDEX_CLEANUP stuff (added to Postgres 12 by
commit a96c41feec6), on general principle. I think that that
organization is a lot easier to follow.

[1] https://www.sciencedirect.com/science/article/pii/002200009390020W
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-06-24 21:02:10 Re: Default setting for enable_hashagg_disk
Previous Message Tom Lane 2020-06-24 19:54:01 Re: Why forbid "INSERT INTO t () VALUES ();"