Re: should vacuum's first heap pass be read-only?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-04-07 16:15:44
Message-ID: CAH2-Wzn_6=qCuLF_jmrYrBWat+_6CBduQQA9UMC2zo3p1gYckQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 7, 2022 at 6:45 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Well, OK, here's what I don't understand. Let's say I insert a tuple
> and then I delete a tuple. Then time goes by and other things happen,
> including but those things do not include a heap vacuum. However,
> during that time, all transactions that were in progress at the time
> of the insert-then-delete have now completed. At the end of that time,
> the number of things that need to be cleaned out of the heap is
> exactly 1: there is either a dead line pointer, or if the page hasn't
> been pruned yet, there is a dead tuple. The number of things that need
> to be cleaned out of the index is <= 1,

I don't think it's useful to talk about only 1 or 2 tuple (or
tuple-like) units of bloat in isolation.

> because the index tuple could
> have gotten nuked by kill_prior_tuple or bottom-up index deletion, or
> it might still be there. It follows that the number of dead line
> pointers (or tuples that can be truncated to dead line pointers) in
> the heap is always greater than or equal to the number in the index.

These techniques are effective because they limit the *concentration*
of garbage index tuples in any part of the index's key space (or the
concentration on individual leaf pages, if you prefer). There is a
huge difference between 100 dead tuples that are obsolete versions of
100 logical rows, and 100 dead tuples that are obsolete versions of
only 2 or 3 logical rows. In general just counting the number of dead
index tuples can be highly misleading.

> All things being equal, that means the heap is always in trouble
> before the index is in trouble. Maybe all things are not equal, but I
> don't know why that should be so. It feels like the index has
> opportunistic cleanup mechanisms that can completely eliminate index
> tuples, while the heap can at best replace dead tuples with dead line
> pointers which still consume some resources.

The problem with line pointer bloat is not really that you're wasting
these 4 byte units of space. The problem comes from second order
effects. By allowing line pointer bloat in the short term, there is a
tendency for heap fragmentation to build in the long term. I am
referring to a situation in which heap tuples tend to get located in
random places over time, leaving logically related heap tuples (those
originally inserted around the same time, by the same transaction)
strewn all over the place (not packed together).

Fragmentation will persist after VACUUM runs and makes all LP_DEAD
items LP_UNUSED. It can be thought of as a degenerative process. If it
was enough to just VACUUM and then reclaim the LP space periodically,
then there wouldn't be much of a problem to fix here. I don't expect
that you'll find this explanation of things satisfactory, since it is
very complicated, and admits a lot of uncertainty. That's just what
experience has led me to believe.

The fact that all of this is so complicated makes techniques that
focus on lowering costs and bounding the uncertainty/worst case seem
so compelling to me -- focussing on keeping costs low (without being
overly concerned about the needs of the workload) is underexploited
right now. Even if you could come up with a 100% needs driven (and 0%
cost-of-cleanup driven) model that really worked, it would still be
very sensitive to having accurate paramaters -- but accurate current
information is hard to come by right now. And so I just don't see it
ever adding much value on its own.

> And if that's the case then doing more index vacuum cycles than we do
> heap vacuum cycles really isn't a sensible thing to do. You seem to
> think it is, though... what am I missing?

Bottom-up index deletion is only effective with logically unchanged
B-Tree indexes and non-HOT updates. The kill_prior_tuple technique is
only effective when index scans happen to read the same part of the
key space for a B-Tree, GiST, or hash index. That leaves a lot of
other cases unaddressed.

I just can't imagine a plausible workload in which there are real
problems, which manifest themselves as line pointer bloat first, with
any problems in indexes coming up only much later, if at all
(admittedly I could probably contrive such a case if I wanted to).
Absence of evidence isn't evidence of absence, though. Just giving you
my opinion.

Again, though, I must ask: why does it matter either way? Even if such
a scenario were reasonably common, it wouldn't necessarily make life
harder for you here.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2022-04-07 16:19:33 Re: How about a psql backslash command to show GUCs?
Previous Message Mark Wong 2022-04-07 16:15:18 Re: trigger example for plsample