Re: New IndexAM API controlling index vacuum strategies

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-02-02 04:27:33
Message-ID: CAH2-Wznq5X-HdyTcavFdZLfjbUaP1=CUDUztF5EOd+T5qGrGmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 1, 2021 at 7:17 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Here is how the triggering criteria could work: maybe skipping
> accessing all indexes during VACUUM happens when less than 1% or
> 10,000 of the items from the table are to be removed by VACUUM --
> whichever is greater. Of course this is just the first thing I thought
> of. It's a starting point for further discussion.

And now here is the second thing I thought of, which is much better:

Sometimes 1% of the dead tuples in a heap relation will be spread
across 90%+ of the pages. With other workloads 1% of dead tuples might
be highly concentrated, and appear in no more than 1% of all heap
pages. Obviously the distinction between these two cases/workloads
matters a lot. And so the triggering criteria must be quantitative
*and* qualitative. It should not be based on counting dead tuples,
since that alone won't differentiate these two extreme cases - both of
which are probably quite common (in the real world extremes are
actually the normal and common case IME).

I like the idea of basing it on counting *heap blocks*, not dead
tuples. We can count heap blocks that have *at least* one dead tuple
(of course it doesn't matter how they're dead, whether it was this
VACUUM operation or some earlier opportunistic pruning). Note in
particular that it should not matter if it's a heap block that has
only one LP_DEAD line pointer or a heap page that is near the
MaxHeapTuplesPerPage limit for the page -- we count either type of
page towards the heap-page based limit used to decide if index
vacuuming goes ahead for all indexes during VACUUM.

This removes almost all of the issues with not setting visibility map
bits reliably (another concern of mine that I forget to mention
earlier), but it is still very likely to do the right thing in the
"~99.9% append-only table" case I mentioned in the last email. We can
be relatively aggressive (say by triggering index skipping when less
than 1% of all heap pages have dead tuples). Plus the new nbtree index
tuple deletion stuff is naturally very good at deleting index tuples
in the event of dead tuples becoming concentrated in relatively few
table blocks -- that helps too. This is true of the enhanced simple
deletion mechanism (which has been taught to be clever about table
block dead tuple concentrations/table layout), as well as bottom-up
index deletion.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-02-02 04:52:38 Re: Faulty HEAP_XMAX_LOCK_ONLY & HEAP_KEYS_UPDATED hintbit combination
Previous Message Julien Rouhaud 2021-02-02 04:21:33 Re: Faulty HEAP_XMAX_LOCK_ONLY & HEAP_KEYS_UPDATED hintbit combination