Re: should vacuum's first heap pass be read-only?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-04-01 19:06:52
Message-ID: CAH2-Wz=03-qc0c467KdDikN=Kmrc4G7NoK6uJTBVoU263KkcdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 1, 2022 at 11:39 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> So I'm completely confused here. If we always start a vacuum with
> lazy_scan_heap(), as you said you wanted, then we will not save any
> heap scanning.

The term "start a VACUUM" becomes ambiguous with the conveyor belt.

What I was addressed in a nearby email back in February [1] was the
idea of doing heap vacuuming of the last run (or several runs) of dead
TIDs on top of heap pruning to create the next run/runs of dead TIDs.

> What am I missing?

There is a certain sense in which we are bound to always "start a
vacuum" in lazy_scan_prune(), with any design based on the current
one. How else are we ever going to make a basic initial determination
about which heap LP_DEAD items need their TIDs deleted from indexes,
sooner or later? Obviously that information must always have
originated in lazy_scan_prune (or in lazy_scan_noprune).

With the conveyor belt, and a non-HOT-update heavy workload, we'll
eventually need to exhaustively do index vacuuming of all indexes
(even those that don't need it for their own sake) to make it safe to
remove heap line pointer bloat (to set heap LP_DEAD items to
LP_UNUSED). This will happen least often of all, and is the one
dependency conveyor belt can't help with.

To answer your question: when heap vacuuming does finally happen, we
at least don't need to call lazy_scan_prune for any pages first
(neither the pages we're vacuuming, nor any other heap pages). Plus
the decision to finally clean up line pointer bloat can be made based
on known facts about line pointer bloat, without tying that to other
processing done by lazy_scan_prune() -- so there's greater separation
of concerns.

That having been said...maybe it would make sense to also call
lazy_scan_prune() right after these relatively rare calls to
lazy_vacuum_heap_page(), opportunistically (since we already dirtied
the page once). But that would be an additional optimization, at best; it
wouldn't be the main way that we call lazy_scan_prune().

[1] https://www.postgresql.org/message-id/CAH2-WzmG%3D_vYv0p4bhV8L73_u%2BBkd0JMWe2zHH333oEujhig1g%40mail.gmail.com
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-04-01 19:16:28 Re: Can we automatically add elapsed times to tap test log?
Previous Message Robert Haas 2022-04-01 18:51:58 Re: standby recovery fails (tablespace related) (tentative patch and discussion)