| From: | Greg Burd <greg(at)burd(dot)me> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Refactor how we form HeapTuples for CatalogTuple(Insert|Update) |
| Date: | 2025-11-14 15:31:28 |
| Message-ID: | 2C5C8B8D-8B36-4547-88EB-BDCF9A7C8D94@greg.burd.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello,
I've been working on expanding HOT updates [1] and have recently had
some success (will be posted soon to the thread) moving the work done
within HeapDetermineColumnsInfo() into the executor. I found that
heap_update() is used in two paths: heap_tuple_update() and
simple_tuple_update(), both call into heap_update() which then uses
HeapDetermineColumnsInfo() to find the set of indexed attributes that
have been modified during the update. This makes sense on the normal
path from the executor, but the simple_tuple_update() path is only
called from CatalogTupleUpdate() and when I looked at how catalog tuples
were formed/mutated on that path I discovered something that I had to fix.
Catalog tuples have knowledge of what attributes are mutated implicitly,
but they don't preserve that information and so
HeapDetermineColumnsInfo() has to re-discover that later on (while the
page is locked). While on the surface this isn't such a big deal, it's
been that way and worked well for years I have other motivations (see
[1] again) for improving this.
So, I set about to create a way to retain the knowledge of what was
modified while mutating catalog tuples and pass that information on to
CatalogTupleUpdate() as a Bitmapset. That is the underlying goal of
this patch set. To make no behavioral changes, only syntactic ones, and
end up with something better overall.
That turned out to be a bit of a challenge as we have a lot of places
where catalog information is updated. We also have two different
methods for updating said information. And we intermix them. At times
we hack our way to a solution and ignore convention. I went down the
rabbit hole on this one, but I'm back up for a cup of tea because what
I've done seems materially better to me and I've accomplished the goal
(and can eventually make use of the result in [1]). Is this patch
useful even without that other work? I think so, just on the basis of
cleanup. Let me explain.
Historically there have been two methods for modifying heap tuples
destined for the catalog; Form/GETSTRUCT, and values/nulls/replaces.
This new method provides a more intuitive and less error-prone approach
without changing the fundamentals of the process. In addition, it is
now possible to retain knowledge of the set of mutated attributes when
working with catalog tuples. This isn't used in practice (yet [1]), but
could be a way to avoid the overhead of HeapDetermineColumnsInfo() in
heap_update() where (while holding a lock) we re-discover that set by
comparing old/new HeapTuple datums when trying to find what indexed
attributes have new values and prevent HOT updates.
Method 1: Form/GETSTRUCT
The "Form/GETSTRUCT" model allows for direct access to the tuple data
that is then modified, copied, and then updated via CatalogTupleUpdate().
Old:
Form_pg_index relform = (Form_pg_index) GETSTRUCT(tuple);
relform->inisclustered = false;
CatalogTupleUpdate(pg_index, &tuple->t_self, tuple);
New:
Bitmapset *updated = NULL;
Form_pg_index relform = (Form_pg_index) GETSTRUCT(tuple);
HeapTupleUpdateField(pg_index, indisclustered, false, indexForm, updated);
CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple, updated, NULL);
bms_free(updated);
Method 2: values/nulls/replaces
The "values/nulls/replaces" model collects the necessary information to
either update or create a heap tuple using heap_modify_tuple() or
heap_form_tuple() or sometimes heap_modify_tuple_by_cols(). While all
those functions remain unchanged now the prefered function for catalog
tuple modification is heap_update_tuple().
Old:
bool nulls[Natts_pg_type];
bool replaces[Natts_pg_type];
Datum values[Natts_pg_type];
values[Anum_pg_type_typtype - 1] = CharGetDatum(typeType);
nulls[Anum_pg_type_typdefaultbin - 1] = true;
replaces[Anum_pg_type_oid - 1] = false;
tuple = heap_modify_tuple(tuple, desc, values, nulls, replaces);
CatalogTupleUpdate(pg_type_desc, &tuple->t_self, tuple);
New:
Datum values[Natts_pg_type] = {0};
bool nulls[Natts_pg_type] = {false};
Bitmapset *updated = NULL;
HeapTupleUpdateValue(pg_type, typtype, CharGetDatum(typeType), values,
nulls, updated);
HeapTupleUpdateValueNull(pg_type, typdefaultbin, values, nulls, updated);
HeapTupleSetColumnNotUpdated(pg_type, oid, updated);
tuple = heap_update_tuple(tuple, desc, values, nulls, updated);
CatalogTupleUpdate(pg_type_desc, &tuple->t_self, tuple, updated, NULL);
In addition CatalogTupleUpdate and CatalogTupleInsert now have an
optional argument for the CatalogIndexState, when not provided it is
created and then released for the caller. This eliminates the need for
their "WithInfo" companion functions.
heap_update_tuple() is functionally equivalent to heap_modify_tuple(),
but takes a Bitmapset called "updated" rather than an array of bool
(generally) called "replaces" as a method for indicating what was
modified. Additionally, this new function tries to balance the
trade-offs of calling heap_getattr() versus heap_deform_tuple() based on
the ratio of attributes updated and their known runtime complexities.
Both paths are functionally equivalent.
The changes also include initialization of the values/nulls arrays
rather than loops or memset(). Some effort was also given to unify
naming of these variables and their order so as to lower cognitive
overhead.
So, let me dive into the attached patches. On advice from my mentor
Noah Misch I've divided the changes in two. The first patch contains
all the infrastructure and the set of files where applying the changes
got "interesting" and should be highlighted. The second is the
mechanical changes to the remaining files, and it's HUGE, apologies for
the blast radius here but it couldn't be helped.
0001 - Refactor how we form HeapTuples for CatalogTuple(Insert|Update)
htup.h: is where you'll find the new macros. There is a "getter" called
HeapTupleValue() to find the field in the values array and not worry
about "-1". After that, they break down into a two cases, insert and
update, with forms addressing method 1 (Form/GETSTRUCT) and method 2 (values/nulls/updated).
Insert macros take the form "HeapTupleSet...()" as in
"HeapTupleSetField(...)" or "HeapTupleSetValue(...)". There is the
assignment to NULL which is "HeapTupleSetFieldNull(...)" and
"HeapTupleSetValueNull(...)" as well.
Arguments to these macros are:
table_name, field, value, (nulls), (form_ptr)
These macros simply translate back to the forms that we use now and in
some cases combine work or ensure correctness.
Update macros follow a similar pattern "HeapTupleUpdate...()" and then
have "HeapTupleUpdateField..." and "HeapTupleUpdateValue..." cases to
cover the two methods we use.
Arguments are again uniform:
table_name, field, value, values, nulls, updated
There are additional macros for a few update-related special cases as well:
* HeapTupleMarkColumnUpdated() - not something you'll normally use as
the update methods all capture this information already, but there are
occasions where this is useful.
* HeapTupleUpdateSetAllColumnsUpdated() - 99.9% of the time we have
historically started off with a "replaces" array where every value in
that array is "false", but there are a few places where we had occasion
to do the opposite, start with the assumption that all attributes were
modified. This covers that case setting all the bits in the "updated"
Bitmapset correctly.
heaptuple.c: you'll find here my new heap_update_tuple() function. This
one is interesting as it tries to balance out the cost of heap_getattr()
vs heap_deform_tuple(). Why did I bother? Read the comment on
heap_modify_tuple(), and I was there... so why not? You may disagree
with my approach, here's a portion of the comment that explains it:
* Performance strategy:
* - If updating many attributes (> 2*natts/3), use heap_getattr() to extract
* only the few non-updated attributes. This is O(k*n) where k is the number
* of non-updated attributes, which is small when updating many.
* - If updating few attributes (<= 2*natts/3), use heap_deform_tuple() to
* extract all attributes at once (O(n)), then replace the updated ones.
* This avoids the O(n^2) cost of many heap_getattr() calls.
*
* The threshold of 2*natts/3 balances the fixed O(n) cost of heap_deform_tuple
* against the variable O(k*n) cost of heap_getattr, where k = natts - num_updated.
indexing.c: has the change to CatalogTupleUpdate/Insert and removal of
their "WithInfo" companions. If that second part is controversial then
I don't mind reverting it, but IMO this is cleaner and might either
inspire more call sites to create and reuse CatalogIndexState or for
someone (me?) to address the comment that pre-dates my work:
* XXX: At some point we might cache the CatalogIndexState data
somewhere (perhaps
* in the relcache) so that callers needn't trouble over this.
The remaining files in this patch (we're still on 0001) are in the
category of *interesting* or non-obvious changes I ran into that were
"fun" to fix...
pg_aggregate.c: in AggregateCreate() we run into a "compromise" pattern
where the code code either update or insert. In such cases hackers
should use the macros for update, not insert, everywhere. This records
the additional information for the update path and doesn't burden the
insert path with much more than a Bitmapset to free.
This reminds me, there is also the case where a function is looping and
updating a number of catalog tuples. Care must be taken to scope the
Bitmapset and free it every iteration as in:
while( doing all the things )
{
Bitmapset updated = NULL;
... update a tuple
CatalogTupleUpdate();
bms_free(updated);
}
This also comes up when a function has multiple updates not in a loop.
Don't reuse the Bitmapset, just scope it. If you do reuse it then:
bms_free(updated);
updated = NULL;
and later... when you start using it again:
Assert(bms_is_empty(updated));
pg_constraint.c: brings us to namestrcpy(), take a look at
RenameConstraintById(). When you run into namestrcpy() you've likely
just done the mutation to the form directly but you'll want to mark that
field as updated, as in:
namestrcpy(&(con->conname), newname);
HeapTupleMarkColumnUpdated(pg_constraint, conname, updated);
This pattern is also needed after functions like pg_add_s32_overflow(),
just HeapTupleMarkColumnUpdated() and you're fine. I used the regex:
"namestrcpy|pg_.*_.*flow" to find these cases, YMMV.
alter.c: take a look at AlterObjectRename_internal() which uses
get_object_attnum_name() to get the AttrNumber that is to be altered.
This is fine, but it's not something we can just plug into a macro and
get the proper name expansion. So, we do it the manual way and add a comment:
/*
* NOTE: We can't use the HeapTupleMarkColumnUpdated() macro here because
* 'Anum_name' isn't a table/column name, it's a index for the relation
* passed into the function as an argument.
*/
namestrcpy(&nameattrdata, new_name);
values[Anum_name - 1] = NameGetDatum(&nameattrdata);
updated = bms_add_member(updated, Anum_name - FirstLowInvalidHeapAttributeNumber);
analyze.c: uses the HeapTupleSetAllColumnsUpdated() at the start of the
update_attstats() function to match the replaced code. Later there's
another interesting deviation from the normal pattern in that function
where we want to update the nth stakind/opt/coll this was frustrating at
first until I found that this worked:
HeapTupleSetValue(pg_statistic, stakind1 + k,
Int16GetDatum(stats->stakind[k]), values);
I'll be "cpp", here's what that turns into:
(values)[Anum_pg_statistic_stakind1 + k - 1] = (Int16GetDatum(stats->stakind[k]));
So, we're taking the attribute number adding "k" then subtracting 1, and
voila "Bob's your uncle." While a bit unconventional, I think the
resulting form is understandable, much more than before which was:
i = Anum_pg_statistic_stakind1 - 1;
for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
values[i++] = Int16GetDatum(stats->stakind[k]); /* stakindN */
attribute_stats.c: take a look at set_stats_slot() where I have a
comment explaining this "field + n" method adding a "Don't Panic" might
help too. Here's that comment:
/*
* NOTE: This might seem odd, to be adding 'i' to the name of the field on
* these macros, but that's what we need to do here to find the proper
* slot offset and record the proper value into the updated bitmap.
*
* Example: HeapTupleValue(pg_statistic, stakind1 + i, values);
*
* Becomes: values[Anum_pg_statistic_stakind1 + i - 1];
*
* The result is you're indexing the i'th value, exactly what we needed.
*/
The really amazing and valuable piece here is how that plays out in the following:
HeapTupleUpdateValue(pg_statistic, stakind1 + i, Int16GetDatum(stakind),
values, nulls, updated);
Playing the role of "cpp" again we get:
do { \
(values)[Anum_statistic_stakind1 + i - 1] = (Datum)
(Int16GetDatum(stakind)); \
(nulls)[Anum_statistic_stakind1 + i - 1] = false; \
Anum_statistic_stakind1 - (-7))
} while(0);
The result is that we've set the value/isnull at the correct position in
both the values and nulls arrays and then added the correct attribute
number into the Bitmapset (drops the mic).
Okay, and that about does it for "0001" with the foundation and then the
odd cases.
0002 - Update the remainder of catalog updates using the new APIs
This is a lot of applying the pattern over and over until it all works.
That's it. Nothing much more interesting here.
This is a lot of change, I get that, I think there's value in change
when the result is cleaner and more maintainable. The one place I'll
agree up front where this might case problems is back-patching fixes.
That'll have to have the new approach in some branches and the old
approach in others. It's a 5-year commitment, I don't take that lightly.
If you made it this far, I owe you a beer/coffee/token-of-appreciation.
Without this patch my [1] will have an ugly wart
(HeapDetermineColumnsInfo()) for the CatalogTupleUpdate path only. I
can live with that, I'd rather not.
best,
-greg
[1] https://www.postgresql.org/message-id/flat/78574B24-BE0A-42C5-8075-3FA9FA63B8FC%40amazon.com
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Refactor-how-we-form-HeapTuples-for-CatalogTuple-.patch | application/octet-stream | 64.6 KB |
| v1-0002-Update-the-remainder-of-catalog-updates-using-the.patch | application/octet-stream | 425.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Maxim Orlov | 2025-11-14 15:40:50 | Re: POC: make mxidoff 64 bits |
| Previous Message | Álvaro Herrera | 2025-11-14 15:30:33 | Re: Uncommented GUC in postgresql.conf.sample |