Re: REPACK CONCURRENTLY fails on tables with generated columns

From: Ewan Young <kdbase(dot)hack(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, mihailnikalayeu(at)gmail(dot)com, alvherre(at)kurilemu(dot)de
Subject: Re: REPACK CONCURRENTLY fails on tables with generated columns
Date: 2026-07-02 01:39:42
Message-ID: CAON2xHNSdP2AD4OyzxDB9HtVO3YN7YF94+gbaLxW5KWsTr5yTQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 1, 2026 at 9:57 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Ewan Young <kdbase(dot)hack(at)gmail(dot)com> wrote:
>
> > On Mon, Jun 22, 2026 at 7:12 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > >
> > > Ewan Young <kdbase(dot)hack(at)gmail(dot)com> wrote:
> > >
> > > > The transient heap built by make_new_heap() is intentionally created
> > > > without the old table's defaults and constraints, so it has no generation
> > > > expressions for its generated columns, even though the tuple descriptor
> > > > still has attgenerated set.
> > > >
> > > > When apply_concurrent_update() replays a non-HOT update, it calls
> > > > ExecInsertIndexTuples() with EIIT_IS_UPDATE. To decide whether to pass
> > > > the "indexUnchanged" hint, that calls index_unchanged_by_update() ->
> > > > ExecGetExtraUpdatedCols() -> ExecInitGenerated(), which looks up the
> > > > generation expression of each generated column via build_column_default()
> > > > and errors out when it finds none on the transient heap.
> > > >
> > > > The apply path does not need to recompute generated columns at all: the
> > > > decoded tuple already carries the correct value, and it is only inserted.
> > > > Note also that ExecGetUpdatedCols() already returns an empty set for this
> > > > ResultRelInfo, because it is not part of any range table -- so the
> > > > indexUnchanged determination here is already approximate.
> > >
> > > I'm sorry for the confusion, but the fact that ExecGetUpdatedCols() returns an
> > > empty set is an omission rather than deliberate choice. Assuming we fix that,
> > > the result of ExecGetExtraUpdatedCols() does matter. Thus we should copy the
> > > related pg_attrdef entries, as I suggest in this patch.
> > >
> > > Another question is how serious problem it is that ExecGetUpdatedCols()
> > > returns empty set. AFAICS, "indexUnchanged" does not affect correctness - it's
> > > is only a hint that helps the btree AM decide whent the bottom-up deletion and
> > > de-duplication techniques should (not) be used. I'm not sure it's easy to
> > > update the set for individual UPDATEs: the UPDATE commands REPACK replays
> > > originate from different SQL queries and the logical decoding does not
> > > transfer this information.
> > >
> > > Even then, I think it'd be "less bad" to have ExecGetUpdatedCols() return a
> > > set containing all the attributes rather than empty set. That is, avoid using
> > > the btree optimizations altogether rather than allow them them when not
> > > appropriate. However, per index_unchanged_by_update(), if ExecGetUpdatedCols()
> > > tells that all columns are updated, the result of ExecGetExtraUpdatedCols()
> > > does not matter. Nevertheless, I'd still slightly prefer copying the
> > > pg_attrdef entries to hacking the executor.
> >
> > Agreed, thanks for the correction. Relying on the empty ExecGetUpdatedCols()
> > set was the weak point of my version -- it's an omission, not something to
> > build a second approximation on. Copying the pg_attrdef entries fixes the
> > inconsistency at the root, so let's go with your approach.
> >
> > I applied the patch and ran it through an injection-point reproducer
> > (cassert). Without the fix the bug reproduces (ERROR: no generation
> > expression found for column number 3 ...); with it, REPACK CONCURRENTLY
> > succeeds under a concurrent non-HOT UPDATE for a STORED generated column, an
> > index directly on the generated column, and a VIRTUAL column, with correct
> > values afterwards. Your repack.spec change passes.
> >
> > The approach is right and I've confirmed it fixes the bug, so +1 from me in
> > this direction.
>
> Thanks for checking. Here, 0002 tries to fix the problem of empty updatedCols,
> as I proposed above.
>
> 0001 fixes two minor coding issues that I found when writing 0002.

Both look good to me. 0001 is a clear improvement — makeNode() is the right
idiom, and NULL rather than 0 for the pointer argument.

For 0002, I agree with the approach: since indexUnchanged only feeds the btree
bottom-up-deletion / dedup heuristics, treating all columns as updated (so the
hint is always false) is the safe choice — better to forgo those optimizations
than to misuse them. I checked the mechanism against the current code and it
holds: with ri_RangeTableIndex = 1, ExecGetUpdatedCols() returns the full set,
so index_unchanged_by_update() always sees a changed key column.

+1 on both.

>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>

--
Regards,
Ewan Young

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ewan Young 2026-07-02 02:08:33 Re: satisfies_hash_partition crash
Previous Message Tender Wang 2026-07-02 01:22:30 Re: satisfies_hash_partition crash