From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Burd, Greg" <greg(at)burd(dot)me>, Nikita Malakhov <hukutoc(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) |
Date: | 2025-08-09 07:31:05 |
Message-ID: | aJb5OQsJWCKB5CjU@paquier.xyz |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Aug 08, 2025 at 03:32:16PM -0400, Tom Lane wrote:
> I found some time to look at the v4 patchset, and have a bunch of
> comments of different sizes.
Thanks for the comments. I have replied to some of the items here:
https://www.postgresql.org/message-id/aJbygEBqJgmLS0wF%40paquier.xyz
Will try to answer the rest here.
> 0001:
>
> I'm good with widening all these values to 64 bits, but I wonder
> if it's a great idea to use unadorned "uint64" as the data type.
> That's impossible to grep for, doesn't convey anything much about
> what the variables are, etc. I'm tempted to propose instead
> inventing a typedef "BigOid" or some such name (bikeshedding
> welcome). The elog's could be handled with, say,
> #define PRIBO PRIu64
> This suggestion isn't made with the idea that we'd someday switch
> to an even wider type, but just with the idea of making it clearer
> what these values are being used for. When you see "Oid" you
> know it's some sort of object identifier, and I'm sad to give
> that up here.
Already mentioned on the other message, but I'm OK with a "bigoid"
type and a BigOid type. Honestly, I'm not sure about a replacement
for PRIu64, as we don't really do that for Oids with %u.
> toast_pointer isn't initialized at this point. I see you fixed that
> in 0004, but it doesn't help to split the patch series if the
> intermediate steps are broken.
Oops. FWIW, I've rebased this patch set much more than 4 times. It
looks like I've messed up some of the diffs. Sorry about that.
> 0003:
>
> No objection to the struct renaming, but does this go far
> enough? Aren't we going to need to rename TOAST_POINTER_SIZE
> to TOAST_OID_POINTER_SIZE, etc, so that we can have similar
> symbols for the wider version? I'm suspicious of not renaming
> the functions that work on these, too. (Oh, it looks like you
> did some of that in later parts.)
Yeah. I've stuck that into the later parts where the int8 bits have
been added. Perhaps I've not been ambitious enough.
> BTW, given that varatt.h has "typedef struct varatt_external {...}
> varatt_external", I'm certain that the vast majority of uses of
> "struct varatt_external" could be shortened to "varatt_external".
> And I think we should do that, because using "struct foo" not "foo"
> is not project style. This patch would be a fine time to do that.
Good point.
> 0004:
>
> Shouldn't VARATT_EXTERNAL_GET_POINTER go away entirely?
> It looks to me like every use of that should be replaced by
> toast_external_info_get_data().
Point about indirect pointers mentioned on the other message, where we
could rename VARATT_EXTERNAL_GET_POINTER to
VARATT_EXTERNAL_INDIRECT_GET_POINTER or equivalent to limit the
confusion?
> I wonder if we shouldn't try to get rid of the phraseology "standard
> TOAST pointer", and instead write something like "short TOAST pointer"
> or "small TOAST pointer". These aren't really going to be more
> "standard" than the wider ones, IMO.
I'm seeing one place in arrayfuncs.c and one in detoast.h using this
term on HEAD. I would do simpler: no standard and no short, just with
a removal of the "standard" part if I were to change something.
> I don't like replacing "va_valueid" with just "value". Dropping
> the "id" is not an improvement, because now a reader might be
> confused about whether this is somehow the actual value of the
> toasted datum.
Okay, sure. One reason behind the field renaming was also to track
down all the areas in need to be updated.
> Functions in toast_external.c are under-documented. Some lack
> any header comment whatever, and others have one that doesn't
> actually say what they do.
Okay.
> I kind of wonder whether the run-time indirection this design causes
> is going to be a performance problem. Perhaps not, given the expenses
> involved in accessing a toasted value, but it has a faint feeling
> of overdesign.
Mentioned on the other message, linked to this message:
https://www.postgresql.org/message-id/aGzLiDUB_18-8aVQ%40paquier.xyz
> It looks like a lot of this code would just flat out dump core if
> passed an invalid vartag value. Probably not good enough, given
> that we might look at corrupted data.
Right. That's where we would need an error path in
toast_external_get_info() if nothing is found. That's a cheap
defense, I guess. Good thing is that the lookups are centralized in a
single code path, at least with this design.
> I kind of feel that treating this operation as a toast_external_info
> method is the wrong thing, since where it looks like you are going
> is to fetch an identifier and only then decide which vartag you
> need to use. That ugliness comes out here:
>
> + /*
> + * Retrieve the external TOAST information, with the value still unknown.
> + * We need to do this again once we know the actual value assigned, to
> + * define the correct vartag_external for the new TOAST tuple.
> + */
Yeah, I was feeling a bit depressed when writing the concept of what a
"default" method should be before assigning the value, but that was
still feeling right.
> 0007:
>
> Surely we do not need the cast in this:
>
> + return murmurhash64((int64) DatumGetInt64(datum));
>
> I see you copied that from int4hashfast, but that's not good
> style either.
Okay.
>
> More generally, though, why do we need catcache support for int8?
> There aren't any caches on toast chunks, and I doubt we are going to
> introduce any. Sure there might be a need for this down the road,
> but I don't see that this patch series is the time to add it.
I recall that this part was required for the value conflict lookups
and also the TOAST value retrievals, so we are going to need it.
> 0008:
>
> I totally hate the idea of introducing a GUC for this. This should be
> user-transparent. Or if it isn't user-transparent, a GUC is still
> the wrong thing; some kind of relation option would be more sensible.
Per other email, I'm not sure what you entirely mean here: should 8B
values be the default with existing TOAST OID values kept as they are
across upgrades? Or something else?
> 0009:
>
> I do not love this either. I think the right thing is just to widen
> the existing nextOid counter to 64 bits, and increment it in-memory as
> 64 bits, and then return either all 64 bits when a 64-bit Oid is asked
> for, or just the low-order 32 bits when a 32-bit OID is asked for
> (with appropriate hacking for 32-bit wraparound). Having a separate
> counter will make it too easy for the same value to be produced by
> nearby calls to the 32-bit and 64-bit Oid generators, which is likely
> a bad thing, if only because of potential confusion.
Acknowledged on other message, fine my be.
> 0011:
>
> Not reviewing this yet, because I disagree with the basic design.
> I didn't look at the later patches either.
Without an agreement about the design choices related to the first
patches up to 0006, I doubt there this is any need to review any of
the follow-up patches yet because the choices of the first patches
influence the next patches in the series. Thanks for the feedback!
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-08-09 07:39:58 | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) |
Previous Message | Michael Paquier | 2025-08-09 07:02:24 | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) |