Re: Partial index "microvacuum"

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Marko Tiikkaja <marko(at)joh(dot)to>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partial index "microvacuum"
Date: 2021-09-16 16:19:17
Message-ID: CAH2-Wznn3kmM3m6CWQQR-jmzvuSeLpuyEbvNtE4ZyWYs92H-rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 16, 2021 at 4:45 AM Marko Tiikkaja <marko(at)joh(dot)to> wrote:
> Huh. Interesting. I'm sorry, I wasn't aware of this work and didn't
> have version 14 at hand. But it looks like both the partial index as
> well as the secondary index on (id::text) get cleaned up nicely there.

That's great.

I understand why other hackers see partial indexes as a special case,
but I don't really see them that way. The only substantive difference
is the considerations for HOT safety in your scenario, versus a
scenario with an equivalent non-partial index. By equivalent I mean an
index that is the same in every way, but doesn't have a predicate. And
with the same workload. In other words, an index that really should
have been partial (because the "extra" index tuples are useless in
practice), but for whatever reason wasn't defined that way.

If you look at what's going on at the level of the constantly modified
leaf pages in each scenario, then you'll see no differences -- none at
all. The problem of VACUUM running infrequently is really no worse
with the partial index. VACUUM runs infrequently relative to the small
useful working set in *either* scenario. The useless extra index
tuples in the non-partial-index scenario only *hide* the problem --
obviously they're not protective in any way.

> I even tried a version where I have a snapshot open for the entire
> run, and the subsequents SELECTs clean the bloat up. I'll need to
> read up on the details a bit to understand exactly what changed, but
> it appears that at least this particular pattern has already been
> fixed.

Bottom-up index deletion tends to help even when a snapshot holds back
cleanup. For example:

https://www.postgresql.org/message-id/CAGnEbogATZS1mWMVX8FzZHMXzuDEcb10AnVwwhCtXtiBpg3XLQ@mail.gmail.com

It's hard to explain exactly why this happens. The short version is
that there is a synergy between deduplication and bottom-up index
deletion. As bottom-up index deletion starts to fail (because it
fundamentally isn't possible to delete any more index tuples on the
page due to the basic invariants for cleanup not allowing it),
deduplication "takes over for the page". Deduplication can "absorb"
extra versions from non-hot updates. A deduplication pass could easily
buy us enough time for the old snapshot to naturally go away. Next
time around a bottom-up index deletion pass is attempted for the same
page, we'll probably find something to delete.

Just accepting version-driven page splits was always a permanent
solution to a temporary problem.

> Thank you so much for your work on this!

Thanks Marko!

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Joseph Krogh 2021-09-16 16:53:07 Re: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails
Previous Message gkokolatos 2021-09-16 15:17:15 Re: Teach pg_receivewal to use lz4 compression