From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | "Burd, Greg" <greg(at)burd(dot)me>, Nikita Malakhov <hukutoc(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) |
Date: | 2025-08-08 19:32:16 |
Message-ID: | 1891064.1754681536@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Michael Paquier <michael(at)paquier(dot)xyz> writes:
> Attached is a v4, due to conflicts mainly caused by the recent changes
> in varatt.h done by e035863c9a04.
I found some time to look at the v4 patchset, and have a bunch of
comments of different sizes.
0001:
I'm good with widening all these values to 64 bits, but I wonder
if it's a great idea to use unadorned "uint64" as the data type.
That's impossible to grep for, doesn't convey anything much about
what the variables are, etc. I'm tempted to propose instead
inventing a typedef "BigOid" or some such name (bikeshedding
welcome). The elog's could be handled with, say,
#define PRIBO PRIu64
This suggestion isn't made with the idea that we'd someday switch
to an even wider type, but just with the idea of making it clearer
what these values are being used for. When you see "Oid" you
know it's some sort of object identifier, and I'm sad to give
that up here.
This hunk is flat out buggy:
@@ -1766,6 +1774,7 @@ check_tuple_attribute(HeapCheckContext *ctx)
return true;
/* It is external, and we're looking at a page on disk */
+ toast_pointer_valueid = toast_pointer.va_valueid;
/*
* Must copy attr into toast_pointer for alignment considerations
toast_pointer isn't initialized at this point. I see you fixed that
in 0004, but it doesn't help to split the patch series if the
intermediate steps are broken.
0002:
OK, although some of the hunks in 0001 seem to belong here.
0003:
No objection to the struct renaming, but does this go far
enough? Aren't we going to need to rename TOAST_POINTER_SIZE
to TOAST_OID_POINTER_SIZE, etc, so that we can have similar
symbols for the wider version? I'm suspicious of not renaming
the functions that work on these, too. (Oh, it looks like you
did some of that in later parts.)
BTW, given that varatt.h has "typedef struct varatt_external {...}
varatt_external", I'm certain that the vast majority of uses of
"struct varatt_external" could be shortened to "varatt_external".
And I think we should do that, because using "struct foo" not "foo"
is not project style. This patch would be a fine time to do that.
0004:
Shouldn't VARATT_EXTERNAL_GET_POINTER go away entirely?
It looks to me like every use of that should be replaced by
toast_external_info_get_data().
I wonder if we shouldn't try to get rid of the phraseology "standard
TOAST pointer", and instead write something like "short TOAST pointer"
or "small TOAST pointer". These aren't really going to be more
"standard" than the wider ones, IMO.
I don't like replacing "va_valueid" with just "value". Dropping
the "id" is not an improvement, because now a reader might be
confused about whether this is somehow the actual value of the
toasted datum.
Functions in toast_external.c are under-documented. Some lack
any header comment whatever, and others have one that doesn't
actually say what they do.
I kind of wonder whether the run-time indirection this design causes
is going to be a performance problem. Perhaps not, given the expenses
involved in accessing a toasted value, but it has a faint feeling
of overdesign.
It looks like a lot of this code would just flat out dump core if
passed an invalid vartag value. Probably not good enough, given
that we might look at corrupted data.
0005:
Nice!
0006:
get_new_value is very confusingly documented. Is the indexid that of
the toastrel's index? What attribute is the attnum for? Why should
the caller need to provide either, rather than get_new_value knowing
that internally? Also, again you are using "value" for something
that is not the value of the to-be-toasted datum. I'd suggest
something more like "get_new_toast_identifier".
I kind of feel that treating this operation as a toast_external_info
method is the wrong thing, since where it looks like you are going
is to fetch an identifier and only then decide which vartag you
need to use. That ugliness comes out here:
+ /*
+ * Retrieve the external TOAST information, with the value still unknown.
+ * We need to do this again once we know the actual value assigned, to
+ * define the correct vartag_external for the new TOAST tuple.
+ */
0007:
Surely we do not need the cast in this:
+ return murmurhash64((int64) DatumGetInt64(datum));
I see you copied that from int4hashfast, but that's not good
style either.
More generally, though, why do we need catcache support for int8?
There aren't any caches on toast chunks, and I doubt we are going to
introduce any. Sure there might be a need for this down the road,
but I don't see that this patch series is the time to add it.
0008:
I totally hate the idea of introducing a GUC for this. This should be
user-transparent. Or if it isn't user-transparent, a GUC is still
the wrong thing; some kind of relation option would be more sensible.
0009:
I do not love this either. I think the right thing is just to widen
the existing nextOid counter to 64 bits, and increment it in-memory as
64 bits, and then return either all 64 bits when a 64-bit Oid is asked
for, or just the low-order 32 bits when a 32-bit OID is asked for
(with appropriate hacking for 32-bit wraparound). Having a separate
counter will make it too easy for the same value to be produced by
nearby calls to the 32-bit and 64-bit Oid generators, which is likely
a bad thing, if only because of potential confusion.
0010:
OK
0011:
Not reviewing this yet, because I disagree with the basic design.
I didn't look at the later patches either.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-08-08 20:07:11 | Re: pgaio_io_get_id() type (was Re: Datum as struct) |
Previous Message | Peter Eisentraut | 2025-08-08 18:31:22 | Re: SCRAM pass-through authentication for postgres_fdw |