Re: Adding REPACK [concurrently]

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Adding REPACK [concurrently]
Date: 2026-03-18 20:07:09
Message-ID: 98417.1773864429@localhost
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> wrote:

> TRAP: failed Assert("RelationGetRelid(relation) == ((RepackDecodingState *) ctx->output_writer_private)->relid"), File: "pgoutput_repack.c",
> Line: 97, PID: 397007

> This crash happens if we run REPACK (concurrently) on a table while a heavy
> pgbench workload is concurrently executing multi-table(setup.sql) transactions(dual_chaos.sql).
> It triggers after a few back to back REPACK (concurrently) runs.
>
> i think i found the cause for this crash , because there were some changes which
> slipped under the nose of the change_useless_for_repack filter , which led some
> changes which are not related to the relation which we are currently doing REPACK (concurrently)
> got decoded and added into the reorderbuffer queue, the reason for this is repacked_rel_locator.relNumber
> is by default set to InvalidOid, this is actually set to the target relation during setup_logical_decoding
> but this done after DecodingContextFindStartpoint, in DecodingContextFindStartpoint changes are not
> filtered even if its not related to the target relation , because rm_decode->change_useless_for_repack->am_decoding_for_repack
> where repacked_rel_locator.relNumber is still InvalidOid, which makes it skip the filtering even its not the target relation,
> this makes it to be added to reorder buffer queue, so during the processing of reorder buffer plugin_change is called
> where assert fails, i have attached a diff patch to solve this.

Thanks a lot! Yes, your explanation makes sense. I'll include the fix in the
next version. I think it might also explain the other crash [1] you reported
earlier. I'll try to reproduce that.

[1] https://www.postgresql.org/message-id/CAFC%2Bb6o2yzA80YmfEhmMO9puN8qvGRvr-15BBLn3UmJxPfpr2w%40mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2026-03-18 20:11:30 Re: Initial COPY of Logical Replication is too slow
Previous Message Zsolt Parragi 2026-03-18 19:54:00 Re: A stack allocation API