Re: Refactor how we form HeapTuples for CatalogTuple(Insert|Update)

From: Greg Burd <greg(at)burd(dot)me>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Refactor how we form HeapTuples for CatalogTuple(Insert|Update)
Date: 2025-12-03 14:11:03
Message-ID: 68E38CCB-3163-4571-AF5F-B30E18D04380@greg.burd.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Dec 2 2025, at 11:16 pm, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Tue, Dec 02, 2025 at 10:21:36AM -0500, Greg Burd wrote:
>> On Dec 2 2025, at 2:09 am, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> Thanks for taking the time to review this chunky patch. I've been
>> quietly reworking it a bit, I'll go over that in a bit because I'd
>> appreciate opinions before I'm too far into the changes.

Michael,

Thanks for continuing to help me along with this patch, I appreciate
your attention and time.

> Noted.
>
>> I started off just trying to make
>> HeapDetermineColumnsInfo() go away, these seem like good goals as well
>> and touch the same area of code.
>
> This one is an interesting piece of argument.

I've looked in the email archives a bit and didn't come up with much
related to catalog upgrades. I don't know if there is much on the
record information on this idea, but from what I've been told the idea
of decoupling heap from catalogs with the goal of getting closer to
making online upgrades possible has been referenced a few times in
hallway tracks at various conferences.

>>>> indexing.c: has the change to CatalogTupleUpdate/Insert and removal of
>>>> their "WithInfo" companions. If that second part is controversial then
>>>> I don't mind reverting it, but IMO this is cleaner and might either
>>>> inspire more call sites to create and reuse CatalogIndexState or for
>>>> someone (me?) to address the comment that pre-dates my work:
>>>
>>> I'm OK with the simplification on this one, but let's make that a
>>> separate patch extracted from 0001. The changes with
>>> CatalogTupleUpdate() and CatalogTupleInsert() don't have to be
>>> included with the changes that introduce a bitmap integration to track
>>> the attributes that are updated. This is pointing to the fact that
>>> there are not that many callers of the WithInfo() flavors in the tree,
>>> compared to the non-info ones (24 AFAIK, vs 160-ish for the Insert
>>> path it seems, for example).
>>
>> That makes sense, but another approach is to do the work mentioned in
>> the comment and somehow cache the CatalogIndexState (which is a somewhat
>> redacted ResultRelInfo by another name) somewhere.
>
> Yes, perhaps we should just do that. That may simplify some of the
> existing catalog paths, where we decide to open directly the indexes,
> as you have already noticed while working on 0002.

I'll see what I can do to extract this piece into a separate patch.

>> New:
>> CatalogUpdateValuesContext(pg_type, ctx);
>>
>> CatalogTupleUpdateValue(ctx, pg_type, typtype, CharGetDatum(typeType));
>> ModifyCatalogTupleValues(relation, tuple, ctx);
>>
>> I'm about a third of the way done converting to this new pattern and
>> it's better. As I said earlier I'm concerned that after a lot of effort
>> the pattern would need to change again, so I'd rather shake out those
>> concerns/ideas now early on. You can see the changes in [1], take a
>> look at htup.h, aclchk.c, pg_aggregate.c, and a few other places.
>
> One thing not mentioned here is CatalogUpdateValuesDecl(), macro that
> hides the bitmap with the updated values, so you'd need one to ensure
> that the rest works. Perhaps all that should not be in htup.h. We
> may want to header reordering to show that there is a split with heap
> tuples, also one of your goals.

Yes, my current thinking for htup.h is here [1] L90-L277 in a GitHub
branch on my fork of the Postgres repo.

Places I've converted so as to see the new pattern:
* aclchk.c [2] shows a insert or modify case
* heap.c [3] modify a catalog tuple
* attribute_stats.c [4] a bit more tricky use case

There are more.

>> Now there are macros for: 1) declaration, 2) setting/mutating, 3)
>> modifying/inserting. I guess I was starting to feel like I was digging
>> a hole no one would appreciate or agree was necessary so I'm asking for
>> early feedback because rule #1 when you find yourself digging a hole is
>> to stop digging.
>
> I'm not completely sure about this one. Some of the directions and
> decisions that need to be taken here also depend on your other work
> posted at [1]. Looking at v23 from this other thread, the code paths
> touched there are different from what's touched here (heaptuple.c vs
> heapam.c). You have mentioned that both share some ground. Is there
> a specific part in the patch of the other thread I should look at to
> get an idea of the bigger picture you have in mind?

Yes, the links referenced above on that branch. I'll post an
updated/rebased patch set here again for reference, but understand it's
only partially done, as a way to capture the idea (and not depend on
links into GitHub).

> [1]: https://www.postgresql.org/message-id/78574B24-BE0A-42C5-8075-3FA9FA63B8FC@amazon.com

Right, within that thread there is this message [5] that notes:

> Ideally, my patch to restructure how catalog tuples are updated [<this
> thread>] is committed and we can fully remove
> HeapDetermineColumnsInfo() and likely speed up all catalog updates in
> the process. That's what motivated [<this thread>], please take a
> look, it required a huge number of changes so I thought it deserved a
> life/thread of its own.

There are two major paths into heap_update(), one from
heapam_tuple_update() and one from simple_heap_update(). The first is
from the executor side via the table AM API and my other thread [6] that
you referenced is trying to address that. This thread is focused on the
catalog updates that flow into simple_heap_update(). Really that
function should be renamed to catalog_heap_update() as that's the only
purpose AFAICT.

Does that connect the dots a bit more clearly?

> --
> Michael

Included in the attached patch set is that first patch from the other
thread as the 0003 patch here and shows the ultimate goal, but it is a work-in-progress.

This patch set removes the need to call HeapDetermineColumnsInfo() on
the simple_heap_update() path because the Bitmapset of modified columns
is passed in and intersected against the indexed attributes
accomplishing the same outcome.

I haven't yet split out the "WithInfo" functions or worked out how to
cache the CatalogIndexState. I thought I'd finalize the approach, then
work that out as it might be impacted by refinements to proposed approach.

* 0001 Reorganize heap update logic and update simple_heap_update()

Focus on the macros in htup.h and the changes to the files that use
these macros in this commit, my ask of reviewers is that we agree on the
pattern established in this commit before I fully implement it across
all files in the 0002 patch.

* 0002 Update the remainder of catalog updates using the new APIs

Most files in this patch are still using the first version of the
macros. Only aclchk.c, heap.c, and index.c have been converted to show
how this approach plays out.

* 0003 Refactor how we form HeapTuples for CatalogTuple(Insert|Update)

This commit is borrowed from the other thread to demonstrate how these
changes impact heap updates. Thanks for your consideration, I
appreciate your time.

best.

-greg

[1] https://github.com/gburd/postgres/blob/cf-6221/src/include/access/htup.h#L90-L277
[2] https://github.com/gburd/postgres/blob/cf-6221/src/backend/catalog/aclchk.c#L1319-L1341
[3] https://github.com/gburd/postgres/blob/cf-6221/src/backend/catalog/heap.c#L1686C2-L1746
[4] https://github.com/gburd/postgres/blob/cf-6221/src/backend/statistics/attribute_stats.c#L750-L806
[5] https://www.postgresql.org/message-id/BCF1BF66-7A1B-4E01-87DC-0BE45EDF2F98%40greg.burd.me
[6] https://www.postgresql.org/message-id/78574B24-BE0A-42C5-8075-3FA9FA63B8FC@amazon.com

Attachment Content-Type Size
v2-0003-Reorganize-heap-update-logic-and-update-simple_he.patch application/octet-stream 48.9 KB
v2-0001-Refactor-how-we-form-HeapTuples-for-CatalogTuple-.patch application/octet-stream 71.6 KB
v2-0002-Update-the-remainder-of-catalog-updates-using-the.patch application/octet-stream 431.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-12-03 14:15:00 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message Robert Haas 2025-12-03 14:08:42 Re: Resetting recovery target parameters in pg_createsubscriber