Re: New IndexAM API controlling index vacuum strategies

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-02-05 20:02:34
Message-ID: CAH2-WznW6v-j9bTP1r6ociFZvkAzU_2uegV+iEXwwZvkZ3o0Mg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 3, 2021 at 8:18 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > I'm starting to think that the right short term goal should not
> > directly involve bottom-up index deletion. We should instead return to
> > the idea of "unifying" the vacuum_cleanup_index_scale_factor feature
> > with the INDEX_CLEANUP feature, which is kind of where this whole idea
> > started out at. This short term goal is much more than mere
> > refactoring. It is still a whole new user-visible feature. The patch
> > would teach VACUUM to skip doing any real index work within both
> > ambulkdelete() and amvacuumcleanup() in many important cases.
> >
>
> I agree to cut the scope. I've also been thinking about the impact of
> this patch on users.

It's probably also true that on balance users care more about the
"~99.9% append-only table" case (as well as the HOT updates workload I
brought up in response to Victor on February 2) than making VACUUM
very sensitive to how well bottom-up deletion is working. Yes, it's
annoying that VACUUM still wastes effort on indexes where bottom-up
deletion alone can do all required garbage collection. But that's not
going to be a huge surprise to users. Whereas the "~99.9% append-only
table" case causes huge surprises to users -- users hate this kind of
thing.

> If vacuum skips both ambulkdelete and amvacuumcleanup in that case,
> I'm concerned that this could increase users who are affected by the
> known issue of leaking deleted pages. Currently, users who are
> affected by that is only users who use INDEX_CLEANUP off. But if we
> enable this feature by default, all users potentially are affected by
> that issue.

FWIW I think that it's unfair to blame INDEX_CLEANUP for any problems
in this area. The truth is that the design of the
deleted-page-recycling stuff has always caused leaked pages, even with
workloads that should not be challenging to the implementation in any
way. See my later remarks.

> To improve index tuple deletion in that case, skipping bulkdelete is
> also a good idea whereas the retail index deletion is also a good
> solution. I have thought the retail index deletion would be
> appropriate to this case but since some index AM cannot support it
> skipping index scans is a good solution anyway.

The big problem with retail index tuple deletion is that it is not
possible once heap pruning takes place (opportunistic pruning, or
pruning performed by VACUUM). Pruning will destroy the information
that retail deletion needs to find the index tuple (the column
values).

I think that we probably will end up using retail index tuple
deletion, but it will only be one strategy among several. We'll never
be able to rely on it, even within nbtree. My personal opinion is that
completely eliminating VACUUM is not a useful long term goal.

> > Here is how the triggering criteria could work: maybe skipping
> > accessing all indexes during VACUUM happens when less than 1% or
> > 10,000 of the items from the table are to be removed by VACUUM --
> > whichever is greater. Of course this is just the first thing I thought
> > of. It's a starting point for further discussion.
>
> I also prefer your second idea :)

Cool. Yeah, I always like it when the potential downside of a design
is obviously low, and the potential upside is obviously very high. I
am much less concerned about any uncertainty around when and how users
will get the big upside. I like it when my problems are "luxury
problems". :-)

> As I mentioned above, I'm still concerned that the extent of users who
> are affected by the issue of leaking deleted pages could get expanded.
> Currently, we don't have a way to detect how many index pages are
> leaked. If there are potential cases where many deleted pages are
> leaked this feature would make things worse.

The existing problems here were significant even before you added
INDEX_CLEANUP. For example, let's say VACUUM deletes a page, and then
later recycles it in the normal/correct way -- this is the simplest
possible case for page deletion. The page is now in the FSM, ready to
be recycled during the next page split. Or is it? Even in this case
there are no guarantees! This is because _bt_getbuf() does not fully
trust the FSM to give it a 100% recycle-safe page for its page split
caller -- _bt_getbuf() checks the page using _bt_page_recyclable()
(which is the same check that VACUUM does to decide a deleted page is
now recyclable). Obviously this means that the FSM can "leak" a page,
just because there happened to be no page splits before wraparound
occurred (and so now _bt_page_recyclable() thinks the very old page is
very new/in the future).

In general the recycling stuff feels ridiculously over engineered to
me. It is true that page deletion is intrinsically complicated, and is
worth having -- that makes sense to me. But the complexity of the
recycling stuff seems ridiculous.

There is only one sensible solution: follow the example of commit
6655a7299d8 in nbtree. This commit fully fixed exactly the same
problem in GiST by storing an epoch alongside the XID. This nbtree fix
is even anticipated by the commit message of 6655a7299d8. I can take
care of this myself for Postgres 14.

> I agreed to cut the scope for PG14. It would be good if we could
> improve index vacuum while cutting cut the scope for PG14 and not
> expanding the extent of the impact of this issue.

Great! Well, if I take care of the _bt_page_recyclable()
wraparound/epoch issue in a general kind of way then AFAICT there is
no added risk.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2021-02-05 20:06:46 Re: Fuzz testing COPY FROM parsing
Previous Message Heikki Linnakangas 2021-02-05 19:50:40 Re: Fuzz testing COPY FROM parsing