Re: New IndexAM API controlling index vacuum strategies

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-03-29 04:16:06
Message-ID: CAH2-WznybU9LC73yWkN_w4xF=yPXdL3O83hC6gW-wE-57Q6DJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 25, 2021 at 6:58 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Attached is v7, which takes the last two patches from your v6 and
> rebases them on top of my recent work.

And now here's v8, which has the following additional cleanup:

* Added useful log_autovacuum output.

This should provide DBAs with a useful tool for seeing how effective
this optimization is. But I think that they'll also end up using it to
monitor things like how effective HOT is with certain tables over
time. If regular autovacuums indicate that there is no need to do
index vacuuming, then HOT must be working well. Whereas if autovacuums
continually require index vacuuming, it might well be taken as a sign
that heap fill factor should be reduced. There are complicated reasons
why HOT might not work quite as well as expected, and having near real
time insight into it strikes me as valuable.

* Added this assertion to the patch that removes the tupgone special
case, which seems really useful to me:

@@ -2421,6 +2374,12 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
vmbuffer = InvalidBuffer;
}

+ /*
+ * We set all LP_DEAD items from the first heap pass to LP_UNUSED during
+ * the second heap pass. No more, no less.
+ */
+ Assert(vacrel->num_index_scans > 1 || tupindex == vacrel->lpdead_items);
+
ereport(elevel,
(errmsg("\"%s\": removed %d dead item identifiers in %u pages",
vacrel->relname, tupindex, vacuumed_pages),

This assertion verifies that the number of items that we have vacuumed
in a second pass of the heap precisely matches the number of LP_DEAD
items encountered in the first pass of the heap. Of course, these
LP_DEAD items are now exactly the same thing as dead_tuples array TIDs
that we vacuum/remove from indexes, before finally vacuuming/removing
them from the heap.

* A lot more polishing in the first patch, which refactors the
vacuumlazy.c state quite a bit. I now use int64 instead of double for
some of the counters, which enables various assertions, including the
one I mentioned.

The instrumentation state in vacuumlazy.c has always been a mess. I
spotted a bug in the process of cleaning it up, at this point:

/* If no indexes, make log report that lazy_vacuum_heap would've made */
if (vacuumed_pages)
ereport(elevel,
(errmsg("\"%s\": removed %.0f row versions in %u pages",
vacrelstats->relname,
tups_vacuumed, vacuumed_pages)));

This is wrong because lazy_vacuum_heap() doesn't report tups_vacuumed.
It actually reports what I'm calling lpdead_items, which can have a
very different value to tups_vacuumed/tuples_deleted.

--
Peter Geoghegan

Attachment Content-Type Size
v8-0001-Centralize-state-for-each-VACUUM.patch application/octet-stream 115.9 KB
v8-0003-Remove-tupgone-special-case-from-vacuumlazy.c.patch application/octet-stream 42.1 KB
v8-0004-Skip-index-vacuuming-in-some-cases.patch application/octet-stream 25.8 KB
v8-0002-Break-lazy_scan_heap-up-into-functions.patch application/octet-stream 62.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-03-29 04:17:46 Re: libpq debug log
Previous Message 'alvherre@alvh.no-ip.org' 2021-03-29 03:02:58 Re: libpq debug log