Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Burd, Greg" <greg(at)burd(dot)me>, Nikita Malakhov <hukutoc(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
Date: 2025-08-09 07:02:24
Message-ID: aJbygEBqJgmLS0wF@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 08, 2025 at 05:02:46PM -0400, Andres Freund wrote:
> On 2025-08-08 15:32:16 -0400, Tom Lane wrote:
>> I found some time to look at the v4 patchset, and have a bunch of
>> comments of different sizes.

Thanks for the input. This patch set exists as a result of a
discussion between you and Andres back in 2022.

(Replying here regarding the points that Andres is quoting, I'll add a
second mail for some of the other things later.)

>> I'm good with widening all these values to 64 bits, but I wonder
>> if it's a great idea to use unadorned "uint64" as the data type.
>> That's impossible to grep for, doesn't convey anything much about
>> what the variables are, etc. I'm tempted to propose instead
>> inventing a typedef "BigOid" or some such name (bikeshedding
>> welcome). The elog's could be handled with, say,
>> #define PRIBO PRIu64
>> This suggestion isn't made with the idea that we'd someday switch
>> to an even wider type, but just with the idea of making it clearer
>> what these values are being used for. When you see "Oid" you
>> know it's some sort of object identifier, and I'm sad to give
>> that up here.
>
> I think we should consider introducing Oid64, instead of having a toast
> specific type. I doubt this is the last place that we'll want to use 64 bit
> wide types for cataloged entities.

Works for me for the grepping argument. Using "64" as a number of
bits in the type name sounds a bit strange to me, not in line with
what's done with bigint/int8 or float, where we use bytes. Naming a
dedicated type "bigoid" with a structure named BigOid behind that's
used for TOAST or in the future for other code paths is OK by me.
AFAIK, the itchy point with unsigned 64b would be the casting, but my
take would be to introduce no casts in the first implementation,
particularly for the bigint -> bigoid case.

>> 0004:
>>
>> Shouldn't VARATT_EXTERNAL_GET_POINTER go away entirely?
>> It looks to me like every use of that should be replaced by
>> toast_external_info_get_data().

Indirect toast pointers still wanted it. But perhaps this could just
be renamed VARATT_EXTERNAL_INDIRECT_GET_POINTER() or something like
that if jt's used only for TOAST pointers?

>> I wonder if we shouldn't try to get rid of the phraseology "standard
>> TOAST pointer", and instead write something like "short TOAST pointer"
>> or "small TOAST pointer". These aren't really going to be more
>> "standard" than the wider ones, IMO.
>>
>> I don't like replacing "va_valueid" with just "value". Dropping
>> the "id" is not an improvement, because now a reader might be
>> confused about whether this is somehow the actual value of the
>> toasted datum.

Okay, sure.

>> Functions in toast_external.c are under-documented. Some lack
>> any header comment whatever, and others have one that doesn't
>> actually say what they do.

Okay. Will work on that. I am not sure if it's worth doing yet, it
does not seem like there's a clear agreement about patch 0004 and the
toast_external business I am proposing.

>> I kind of wonder whether the run-time indirection this design causes
>> is going to be a performance problem. Perhaps not, given the expenses
>> involved in accessing a toasted value, but it has a faint feeling
>> of overdesign.

toast_external_info_get_data() is called in 8 places as of the patch
set:
1) Twice for amcheck.
2) Twice for reorderbuffer.
3) Three times in detoast.c
3-1) When fetching a value in full, toast_fetch_datum(), which goes
through the slice API.
3-2) toast_fetch_datum_slice(), to retrieve a slice of the toasted
data.
3-3) detoast_attr_slice(), falling back to the slice fetch.
4) Twice in toast_internals.c:
4-1) Saving a datum.
4-2) Deleting a datum.

And if you see here, upthread, I've defined the worst case as
retrieving a minimal toasted slice stored uncompressed, to measure the
cost of the conversion to this toast_external, without seeing any
effects, the btree conflict lookups doing most of the work:
https://www.postgresql.org/message-id/aGzLiDUB_18-8aVQ%40paquier.xyz

Thoughts and counter-arguments are welcome, of course.

> +1

Well, one reason why this design exists is a point from 2023, made by
Robert H., around here:
https://www.postgresql.org/message-id/CA%2BTgmoaVcjUkmtWdc_9QjBzvSShjDBYk-5XFNaOvYLgGROjJMA%40mail.gmail.com

The argument is that it's hard to miss for the existing code and
extensions how a new vartag should be handled. Inventing a new layer
means that extensions and existing code need to do a switch once, then
they cannot really miss be missed when we add a new vartag because the
from/to indirection between the toast_external layer and the varlenas
is handled in its own place.

A second reason why I've used this design is the problem related to
compression IDs, where the idea would be to add a new vartag for
zstandard for example. This still needs more work due to the existing
limitations that we currently have with the CompressionToastId and its
limitation to four values. The idea is that the deserialization of
this data into this toast_external proposal makes the future additions
easier. It would be of course more brutal and more efficient to just
extend the code paths where the toast_external layer (I mean where
VARATT_IS_EXTERNAL_ONDISK() is used currently) with an extra branch
made of a VARATT_IS_EXTERNAL_ONDISK_BIGOID(), but I've decided to
digest the argument from Robert instead, where I'm aiming at isolating
most of the code related to on-disk external TOAST pointers into a
single file, named toast_external.c here.

I'm OK with the "brutal" method if the patch presented here is
thought as overengineered and overdesigned, just wanted to say that
there are a couple of reasons behind these choices. It's great that
both of you are able to take some time to comment on these choices.
If you disagree with this method and just say that we should go ahead
with a more direct VARATT-like method, that's also fine by me as it
would still solve the value limitation problem. And that's primarily
what I want to solve here as it's a pain for some in the field.

>> 0008:
>>
>> I totally hate the idea of introducing a GUC for this. This should be
>> user-transparent. Or if it isn't user-transparent, a GUC is still
>> the wrong thing; some kind of relation option would be more sensible.
>
> Agreed. I think we need backward compatibility for pg_upgrade purposes, but
> that doesn't require a GUC, it just requires an option when creating tables in
> binary upgrade mode.

So, you basically mean that we should just make the 8-byte case the
default for all the new tables created on a new cluster at upgrade,
and just carry across upgrades the TOAST tables with OID values? If
this stuff uses a reloption or an option at DDL level, that's going to
need some dump/restore parts. Not sure that I'm betting the full
picture of what the default behavior should be. I have read on the
2022 thread where both of you have discussed this issue a point about
a "legacy" mode, to give users a way to get the old behavior if they
wanted. A GUC fit nicely around that idea, because it is them
possible to choose how all tables should behave, or perhaps I just did
not design correctly what you meant through that.

>> 0009:
>>
>> I do not love this either. I think the right thing is just to widen
>> the existing nextOid counter to 64 bits, and increment it in-memory as
>> 64 bits, and then return either all 64 bits when a 64-bit Oid is asked
>> for, or just the low-order 32 bits when a 32-bit OID is asked for
>> (with appropriate hacking for 32-bit wraparound). Having a separate
>> counter will make it too easy for the same value to be produced by
>> nearby calls to the 32-bit and 64-bit Oid generators, which is likely
>> a bad thing, if only because of potential confusion.

I was wondering about doing that as well. FWIW, this approach works
for me, eating only 4 more bytes in the control file.

> I'm not convinced that the global counter, be it a 32 or a 64 bit, approach
> has merit in general, and I'm rather sure it's the wrong thing for toast
> values. There's no straightforward path to move away from the global counter
> for plain oids, but I would suggest simply not using the global oid counter
> for toast IDs.
>
> A large portion of the cases I've seen where toast ID assignments were a
> problem were when the global OID counter wrapped around due to activity on
> *other* tables (and/or temporary table creation). If you instead had a
> per-toast-table sequence for assigning chunk IDs, that problem would largely
> vanish.

I've thought about this piece already and the reason why I have not
taken this approach is mostly simplicity. It's mentioned upthread,
with potentially pieces referring to sequence AMs and "toast"
sequences that could be optimized for the purpose of TOAST relations:
https://www.postgresql.org/message-id/aG2iY26tXj1_MHfH%40paquier.xyz

Yep. All the requirement boxes are filled with a sequence assigned to
the TOAST relation. Using a single 8-byte counter is actually
cheaper, even if the overhead when generating the TOAST tuples are in
the btree lookups and the insertions themselves. It is also possible
to do that as a two-step process:
- First have TOAST relations with 8-byte values.
- Add local sequences later on.
What matters is having the code in place to check for index lookups
upon values conflicting when a new sequence is attached to an existing
TOAST relation. If we assign sequences from the start, that would not
matter, of course. Another reason on top of the simplicity piece
behind the control file is that it's dead cheap to acquire a new
value when saving a chunk externally.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-08-09 07:31:05 Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
Previous Message Richard Guo 2025-08-09 01:32:21 Re: Eager aggregation, take 3