Re: REPACK CONCURRENTLY fails on tables with generated columns

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Ewan Young <kdbase(dot)hack(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, mihailnikalayeu(at)gmail(dot)com, alvherre(at)kurilemu(dot)de
Subject: Re: REPACK CONCURRENTLY fails on tables with generated columns
Date: 2026-06-22 11:12:11
Message-ID: 18222.1782126731@localhost
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ewan Young <kdbase(dot)hack(at)gmail(dot)com> wrote:

> The transient heap built by make_new_heap() is intentionally created
> without the old table's defaults and constraints, so it has no generation
> expressions for its generated columns, even though the tuple descriptor
> still has attgenerated set.
>
> When apply_concurrent_update() replays a non-HOT update, it calls
> ExecInsertIndexTuples() with EIIT_IS_UPDATE. To decide whether to pass
> the "indexUnchanged" hint, that calls index_unchanged_by_update() ->
> ExecGetExtraUpdatedCols() -> ExecInitGenerated(), which looks up the
> generation expression of each generated column via build_column_default()
> and errors out when it finds none on the transient heap.
>
> The apply path does not need to recompute generated columns at all: the
> decoded tuple already carries the correct value, and it is only inserted.
> Note also that ExecGetUpdatedCols() already returns an empty set for this
> ResultRelInfo, because it is not part of any range table -- so the
> indexUnchanged determination here is already approximate.

I'm sorry for the confusion, but the fact that ExecGetUpdatedCols() returns an
empty set is an omission rather than deliberate choice. Assuming we fix that,
the result of ExecGetExtraUpdatedCols() does matter. Thus we should copy the
related pg_attrdef entries, as I suggest in this patch.

Another question is how serious problem it is that ExecGetUpdatedCols()
returns empty set. AFAICS, "indexUnchanged" does not affect correctness - it's
is only a hint that helps the btree AM decide whent the bottom-up deletion and
de-duplication techniques should (not) be used. I'm not sure it's easy to
update the set for individual UPDATEs: the UPDATE commands REPACK replays
originate from different SQL queries and the logical decoding does not
transfer this information.

Even then, I think it'd be "less bad" to have ExecGetUpdatedCols() return a
set containing all the attributes rather than empty set. That is, avoid using
the btree optimizations altogether rather than allow them them when not
appropriate. However, per index_unchanged_by_update(), if ExecGetUpdatedCols()
tells that all columns are updated, the result of ExecGetExtraUpdatedCols()
does not matter. Nevertheless, I'd still slightly prefer copying the
pg_attrdef entries to hacking the executor.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment Content-Type Size
0001-Copy-the-relevant-pg_attrdef-catalog-entries-for-the.patch text/x-diff 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nishant Sharma 2026-06-22 11:34:01 Re: [PATCH] Add support for SAOP in the optimizer for partial index paths
Previous Message Jeyaprakash Rajamani 2026-06-22 11:09:09 Re: Performance Degradation (Table becomes bloat) During Repeated Bulk UPDATE Operations