Re: Lowering the ever-growing heap->pd_lower

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lowering the ever-growing heap->pd_lower
Date: 2022-04-08 21:06:01
Message-ID: 20220408210601.ny6maqqnfdipq7tx@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-04-08 09:17:40 -0400, Robert Haas wrote:
> I agree that the value of 291 is pretty much accidental, but it also
> seems fairly generous to me. The bigger you make it, the more space
> you can waste. I must have missed (or failed to understand) previous
> discussions about why raising it would be a good idea.

It's not hard to hit scenarios where pages are effectively unusable, because
they have close to 291 dead items, without autovacuum triggering (or
autovacuum just taking a while). You basically just need updates / deletes to
concentrate in a certain range of the table and have indexing that prevents
HOT updates. Because the overall percentage of dead tuples is low, no
autovacuum is triggered, yet a range of the table contains little but dead
items. At which point you basically waste 7k bytes (1164 bytes for dead items
IIRC) until a vacuum finally kicks in - way more than what what you'd waste if
the number of line items were limited at e.g. 2 x MaxHeapTuplesPerPage

This has become a bit more pronounced with vacuum skipping index cleanup when
there's "just a few" dead items - if all your updates concentrate in a small
region, 2% of the whole relation size isn't actually that small.

I wonder if we could reduce the real-world space wastage of the line pointer
array, if we changed the the logic about which OffsetNumbers to use during
inserts / updates and and made a few tweaks to to pruning.

1) It's kind of OK for heap-only tuples to get a high OffsetNumber - we can
reclaim them during pruning once they're dead. They don't leave behind a
dead item that's unreclaimable until the next vacuum with an index cleanup
pass.

2) Arguably the OffsetNumber of a redirect target can be changed. It might
break careless uses of WHERE ctid = ... though (which likely are already
broken, just harder to hit).

These leads me to a few potential improvements:

a) heap_page_prune_prune() should take the number of used items into account
when deciding whether to prune. Right now we trigger hot pruning based on
the number of items only if PageGetMaxOffsetNumber(page) >=
MaxHeapTuplesPerPage. But because it requires a vacuum to reclaim an ItemId
used for a root tuple, we should trigger HOT pruning when it might lower
which OffsetNumber get used.

b) heap_page_prune_prune() should be triggered in more paths. E.g. when
inserting / updating, we should prune if it allows us to avoid using a high
OffsetNumber.

c) What if we left some percentage of ItemIds unused, when looking for the
OffsetNumber of a new HOT row version? That'd make it more likely for
non-HOT updates and inserts to fit onto the page, without permanently
increasing the size of the line pointer array.

d) If we think 2) is acceptable, we could move the targets of redirects to
make space for new root tuples, without increasing the permanent size of
the line pointer array.

Crazy?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-04-08 21:18:15 Re: Lowering the ever-growing heap->pd_lower
Previous Message Nathan Bossart 2022-04-08 20:30:03 Re: Pre-allocating WAL files