Re: Lowering the ever-growing heap->pd_lower

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lowering the ever-growing heap->pd_lower
Date: 2022-04-08 21:43:31
Message-ID: CAH2-Wzns7Rfo_fqfjwcZW-md6weyT+pSx1n0-O+fSZg+ks-hgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 8, 2022 at 2:06 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> It's not hard to hit scenarios where pages are effectively unusable, because
> they have close to 291 dead items, without autovacuum triggering (or
> autovacuum just taking a while).

I think that this is mostly a problem with HOT-updates, and regular
updates to a lesser degree. Deletes seem less troublesome.

I find that it's useful to think in terms of the high watermark number
of versions required for a given logical row over time. It's probably
quite rare for most individual logical rows to truly require more than
2 or 3 versions per row at the same time, to serve queries. Even in
update-heavy tables. And without doing anything fancy with the
definition of HeapTupleSatisfiesVacuum(). There are important
exceptions, certainly, but overall I think that we're still not doing
good enough with these easier cases.

The high watermark number of versions is probably going to be
significantly greater than the typical number of versions for the same
row. So maybe we give up on keeping a row on its original heap block
today, all because of a once-off (or very rare) event where we needed
slightly more extra space for only a fraction of a second.

The tell-tale sign of these kinds of problems can sometimes be seen
with synthetic, rate-limited benchmarks. If it takes a very long time
for the problem to grow, but nothing about the workload really ever
changes, then that suggests problems that have this quality. The
probability of any given logical row being moved to another heap block
is very low. And yet it is inevitable that many (even all) will, given
enough time, given enough opportunities to get unlucky.

> This has become a bit more pronounced with vacuum skipping index cleanup when
> there's "just a few" dead items - if all your updates concentrate in a small
> region, 2% of the whole relation size isn't actually that small.

The 2% threshold was chosen based on the observation that it was below
the effective threshold where autovacuum just won't ever launch
anything on a moderate sized table (unless you set
autovacuum_vacuum_scale_factor to something absurdly low). The real
problem is that IMV. That's why I think that we need to drive it based
primarily on page-level characteristics. While effectively ignoring
pages that are all-visible when deciding if enough bloat is present to
necessitate vacuuming.

> 1) It's kind of OK for heap-only tuples to get a high OffsetNumber - we can
> reclaim them during pruning once they're dead. They don't leave behind a
> dead item that's unreclaimable until the next vacuum with an index cleanup
> pass.

I like the general direction here, but this particular idea doesn't
seem like a winner.

> 2) Arguably the OffsetNumber of a redirect target can be changed. It might
> break careless uses of WHERE ctid = ... though (which likely are already
> broken, just harder to hit).

That makes perfect sense to me, though.

> a) heap_page_prune_prune() should take the number of used items into account
> when deciding whether to prune. Right now we trigger hot pruning based on
> the number of items only if PageGetMaxOffsetNumber(page) >=
> MaxHeapTuplesPerPage. But because it requires a vacuum to reclaim an ItemId
> used for a root tuple, we should trigger HOT pruning when it might lower
> which OffsetNumber get used.

Unsure about this.

> b) heap_page_prune_prune() should be triggered in more paths. E.g. when
> inserting / updating, we should prune if it allows us to avoid using a high
> OffsetNumber.

Unsure about this too.

I prototyped a design that gives individual backends soft ownership of
heap blocks that were recently allocated, and later prunes the heap
page when it fills [1]. Useful for aborted transactions, where it
preserves locality -- leaving aborted tuples behind makes their space
ultimately reused for unrelated inserts, which is bad. But eager
pruning allows the inserter to leave behind more or less pristine heap
pages, which don't need to be pruned later on.

> c) What if we left some percentage of ItemIds unused, when looking for the
> OffsetNumber of a new HOT row version? That'd make it more likely for
> non-HOT updates and inserts to fit onto the page, without permanently
> increasing the size of the line pointer array.

That sounds promising.

[1] https://postgr.es/m/CAH2-Wzm-VhVeQYTH8hLyYho2wdG8Ecrm0uPQJWjap6BOVfe9Og@mail.gmail.com
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-04-08 21:55:51 Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman
Previous Message Alvaro Herrera 2022-04-08 21:26:38 Re: MERGE bug report