Re: Disabling Heap-Only Tuples

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disabling Heap-Only Tuples
Date: 2023-08-24 15:22:34
Message-ID: CAEze2WggAdDQ4r9LRe6dYzJ5Z95G2mdFYxJTbuGxtACpPy1kGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 7 Jul 2023 at 12:18, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 7/7/23 11:55, Matthias van de Meent wrote:
>> On Fri, 7 Jul 2023 at 06:53, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>>>
>>>
>>> So IIUC, this parameter we can control that instead of putting the new
>>> version of the tuple on the same page, it should choose using
>>> RelationGetBufferForTuple(), and that can reduce the fragmentation
>>> because now if there is space then most of the updated tuple will be
>>> inserted in same pages. But this still can not truncate the pages
>>> from the heap right? because we can not guarantee that the new page
>>> selected by RelationGetBufferForTuple() is not from the end of the
>>> heap, and until we free the pages from the end of the heap, the vacuum
>>> can not truncate any page. Is my understanding correct?
>>
>> Yes. If you don't have pages with (enough) free space for the updated
>> tuples in your table, or if the FSM doesn't accurately reflect the
>> actual state of free space in your table, this won't help (which is
>> also the reason why I run vacuum in the tests). It also won't help if
>> you don't update the tuples physically located at the end of your
>> table, but in the targeted workload this would introduce a bias where
>> new tuple versions are moved to the front of the table.
>>
>> Something to note is that this may result in very bad bloat when this
>> is combined with a low fillfactor: All blocks past max_local_update
>> will be unable to use space reserved by fillfactor because FSM lookups
>> always take fillfactor into account, and all updates (which ignore
>> fillfactor when local) would go through the FSM instead, thus reducing
>> the space available on each block to exactly the fillfactor. So, this
>> might need some extra code to make sure we don't accidentally blow up
>> the table's size with UPDATEs when max_local_update is combined with
>> low fillfactors. I'm not sure where that would fit best.
>>
>
> I know the thread started as "let's disable HOT" and this essentially
> just proposes to do that using a table option. But I wonder if that's
> far too simple to be reliable, because hoping RelationGetBufferForTuple
> happens to do the right thing does not seem great.
>
> I wonder if we should invent some definition of "strategy" that would
> tell RelationGetBufferForTuple what it should aim for ...
>
> I'm imagining either a table option with a couple possible values
> (default, non-hot, first-page, ...) or maybe something even more
> elaborate (perhaps even a callback?).

I mostly agree, but the point is that first we have to get the update
away from the page. Once we've done that, we can start getting smart
about placement in RelationGetBufferForTuple, but unless we decide to
not put the tuple on the old tuple's page no code from
RelationGetBufferForTuple is executed.

We could change the update code to always go through
RelationGetBufferForTuple to determine the target buffer, and make
that function consider page-local updates (instead of heap_update, who
does that now), but I think that'd need significant extra work in
other callsites of RelationGetBufferForTuple as well as that function
itself.

> Now, it's not my intention to hijack this thread, but this discussion
> reminds me one of the ideas from my "BRIN improvements" talk, about
> maybe using BRIN indexes for routing. UPDATEs may be a major issue for
> BRIN, making them gradually worse over time. If we could "tell"
> RelationGetBufferForTuple() which buffers are more suitable (by looking
> at an index, histogram or some approximate mapping), that might help.

Improved tuple routing sounds like a great idea, and I've thought
about it as well. I'm not sure whether BRIN (as-is) is the best
candidate though, considering its O(N) scan complexity - 100GB-scale
tables can reasonably have BRIN indexes of MBs, and running a scan on
that is not likely to have good performance.
If BRIN had hierarchical summaries (e.g. if we had range summaries for
data stored in every nonnegative power of 16 of page ranges) then we
could reduce that to something more reasonable, but that's not
currently implemented and so I don't think that's quite relevant yet.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2023-08-24 15:23:31 Re: PostgreSQL 16 release announcement draft
Previous Message Alvaro Herrera 2023-08-24 15:19:56 Re: PostgreSQL 16 release announcement draft