Re: Disabling Heap-Only Tuples

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disabling Heap-Only Tuples
Date: 2023-07-05 12:12:15
Message-ID: CAEze2WgWwTWjJMA6CKVQ8VZNYG6o-+cPwn37NtQ0e5a1yaph+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 5 Jul 2023 at 13:03, Thom Brown <thom(at)linux(dot)com> wrote:
>
> On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> >
> > On Wed, 5 Jul 2023 at 12:45, Thom Brown <thom(at)linux(dot)com> wrote:
> > > Heap-Only Tuple (HOT) updates are a significant performance
> > > enhancement, as they prevent unnecessary page writes. However, HOT
> > > comes with a caveat: it means that if we have lots of available space
> > > earlier on in the relation, it can only be used for new tuples or in
> > > cases where there's insufficient space on a page for an UPDATE to use
> > > HOT.
> > >
> > > This mechanism limits our options for condensing tables, forcing us to
> > > resort to methods like running VACUUM FULL/CLUSTER or using external
> > > tools like pg_repack. These either require exclusive locks (which will
> > > be a deal-breaker on large tables on a production system), or there's
> > > risks involved. Of course we can always flood pages with new versions
> > > of a row until it's forced onto an early page, but that shouldn't be
> > > necessary.
> > >
> > > Considering these trade-offs, I'd like to propose an option to allow
> > > superusers to disable HOT on tables. The intent is to trade some
> > > performance benefits for the ability to reduce the size of a table
> > > without the typical locking associated with it.
> >
> > Interesting use case, but I think that disabling HOT would be missing
> > the forest for the trees. I think that a feature that disables
> > block-local updates for pages > some offset would be a better solution
> > to your issue: Normal updates also prefer the new tuple to be stored
> > in the same pages as the old tuple if at all possible, so disabling
> > HOT wouldn't solve the issue of tuples residing in the tail of your
> > table - at least not while there is still empty space in those pages.
>
> Hmm... I see your point. It's when an UPDATE isn't going to land on
> the same page that it relocates to the earlier available page. So I
> guess I'm after whatever mechanism would allow that to happen reliably
> and predictably.
>
> So $subject should really be "Allow forcing UPDATEs off the same page".

You'd probably want to do that only for a certain range of the table -
for a table with 1GB of data and 3GB of bloat there is no good reason
to force page-crossing updates in the first 1GB of the table - all
tuples of the table will eventually reside there, so why would you
take a performance penalty and move the tuples from inside that range
to inside that same range?

Something else to note: Indexes would suffer some (large?) amount of
bloat in this process, as you would be updating a lot of tuples
without the HOT optimization, thus increasing the work to be done by
VACUUM.
This may result in more bloat in indexes than what you get back from
shrinking the table.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nitin Jadhav 2023-07-05 12:23:04 Re: Extension Enhancement: Buffer Invalidation in pg_buffercache
Previous Message John Naylor 2023-07-05 11:21:20 Re: [PoC] Improve dead tuple storage for lazy vacuum