Re: Adding REPACK [concurrently]

From: Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Adding REPACK [concurrently]
Date: 2026-04-01 14:55:54
Message-ID: CAFC+b6qJ3vE4zAHn7Q6MA2dixwYcY+K1iEMYhQYH01-Pr9XAjw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

i was fuzz testing v48 , and found a crash when REPACK (concurrently)
itself errors out,
1) while running

create table test(id int);
REPACK (concurrently) test;

TBH i didn't knew this, sometimes it's better to not know "rules" ;)
[NOTE: maybe we should add that we can't run
REPACK (concurrently) on table without identity index or primary key into
repack.sgml]

ERROR: cannot process relation "test"
2026-04-01 19:06:31.211 IST [2495575] HINT: Relation "test" has no
identity index.
2026-04-01 19:06:31.211 IST [2495575] STATEMENT: repack (concurrently)
test;
TRAP: failed Assert("proc->statusFlags ==
ProcGlobal->statusFlags[proc->pgxactoff]"), File: "procarray.c", Line: 719,
PID: 2495575
postgres: srinath postgres [local]
REPACK(ExceptionalCondition+0x98)[0xaaaaad938d84]
postgres: srinath postgres [local]
REPACK(ProcArrayEndTransaction+0x1f0)[0xaaaaad6c15fc]
postgres: srinath postgres [local] REPACK(+0x210cf0)[0xaaaaad190cf0]
postgres: srinath postgres [local] REPACK(+0x2117e4)[0xaaaaad1917e4]
postgres: srinath postgres [local]
REPACK(AbortCurrentTransaction+0x10)[0xaaaaad191740]
postgres: srinath postgres [local]
REPACK(PostgresMain+0x568)[0xaaaaad7116e4]
postgres: srinath postgres [local] REPACK(+0x786ae0)[0xaaaaad706ae0]
postgres: srinath postgres [local]
REPACK(postmaster_child_launch+0x1f0)[0xaaaaad5d719c]
postgres: srinath postgres [local] REPACK(+0x65ea98)[0xaaaaad5dea98]
postgres: srinath postgres [local] REPACK(+0x65b650)[0xaaaaad5db650]
postgres: srinath postgres [local]
REPACK(PostmasterMain+0x1564)[0xaaaaad5dae1c]
postgres: srinath postgres [local] REPACK(main+0x3dc)[0xaaaaad466348]
/lib/aarch64-linux-gnu/libc.so.6(+0x284c4)[0xffffb40d84c4]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffffb40d8598]
postgres: srinath postgres [local] REPACK(_start+0x30)[0xaaaaad06ddf0]
2026-04-01 19:06:31.800 IST [2494560] LOG: client backend (PID 2495575)
was terminated by signal 6: Aborted

2) And when running REPACK (concurrently) on the same table while already a
repack was running
on the same table ,just to verify the deadlock occurs and gets errored out
that
"could not wait for concurrent REPACK" but instead got the same crash.

ERROR: could not wait for concurrent REPACK
2026-04-01 12:55:39.481 IST [2397660] DETAIL: Process 2397660 waits for
REPACK running on process 2397307
2026-04-01 12:55:39.481 IST [2397660] CONTEXT: waiting for
ShareUpdateExclusiveLock on relation 16385 of database 5
2026-04-01 12:55:39.481 IST [2397660] STATEMENT: repack (concurrently)
stress_victim ;
2026-04-01 12:55:39.497 IST [2397151] LOG: checkpoint complete: time:
wrote 2056 buffers (12.5%), wrote 0 SLRU buffers; 0 WAL file(s) added, 0
removed, 0 recycled; write=206.804 s, sync=0.003 s, total=861.616 s; sync
files=17, longest=0.002 s, average=0.001 s; distance=318978 kB,
estimate=515341 kB; lsn=2/02810A60, redo lsn=2/02810910
TRAP: failed Assert("proc->statusFlags ==
ProcGlobal->statusFlags[proc->pgxactoff]"), File: "procarray.c", Line: 719,
PID: 2397660
postgres: srinath postgres [local]
REPACK(ExceptionalCondition+0x98)[0xaaaae7d58d84]
postgres: srinath postgres [local]
REPACK(ProcArrayEndTransaction+0x1f0)[0xaaaae7ae15fc]
postgres: srinath postgres [local] REPACK(+0x210cf0)[0xaaaae75b0cf0]
postgres: srinath postgres [local] REPACK(+0x2117e4)[0xaaaae75b17e4]
postgres: srinath postgres [local]
REPACK(AbortCurrentTransaction+0x10)[0xaaaae75b1740]
postgres: srinath postgres [local]
REPACK(PostgresMain+0x568)[0xaaaae7b316e4]
postgres: srinath postgres [local] REPACK(+0x786ae0)[0xaaaae7b26ae0]
postgres: srinath postgres [local]
REPACK(postmaster_child_launch+0x1f0)[0xaaaae79f719c]
postgres: srinath postgres [local] REPACK(+0x65ea98)[0xaaaae79fea98]
postgres: srinath postgres [local] REPACK(+0x65b650)[0xaaaae79fb650]
postgres: srinath postgres [local]
REPACK(PostmasterMain+0x1564)[0xaaaae79fae1c]
postgres: srinath postgres [local] REPACK(main+0x3dc)[0xaaaae7886348]
/lib/aarch64-linux-gnu/libc.so.6(+0x284c4)[0xffff9ec984c4]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffff9ec98598]
postgres: srinath postgres [local] REPACK(_start+0x30)[0xaaaae748ddf0]
2026-04-01 12:58:18.198 IST [2397147] LOG: client backend (PID 2397660)
was terminated by signal 6: Aborted

the reason for this crash was ProcGlobal->statusFlags was not initialized
during the start of ExecRepack , earlier Abort before reaching the proper
initialization of ProcGlobal->statusFlags which was done in rebuild_relation
caused this assert failure in ProcArrayEndTransaction.

Here's the diff to solve this crash.

diff --git a/src/backend/commands/repack.c b/src/backend/commands/repack.c
index 29ba49744eb..d44092a407a 100644
--- a/src/backend/commands/repack.c
+++ b/src/backend/commands/repack.c
@@ -284,7 +284,23 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool
isTopLevel)
* that others can conflict with.
*/
if ((params.options & CLUOPT_CONCURRENT) != 0)
+ {
+ /*
+ * Do not let other backends wait for our completion during their
+ * setup of logical replication. Unlike logical replication publisher,
+ * we will have XID assigned, so the other backends - whether
+ * walsenders involved in logical replication or regular backends
+ * executing also REPACK (CONCURRENTLY) - would have to wait for our
+ * completion before they can build their initial snapshot. It is o.k.
+ * for any decoding backend to ignore us because we do not change
+ * tuple descriptor of any table, and the data changes we write should
+ * not be decoded by other backends.
+ */
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->statusFlags |= PROC_IN_CONCURRENT_REPACK;
+ ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ LWLockRelease(ProcArrayLock);
+ }

/*
* If a single relation is specified, process it and we're done ... unless
@@ -988,22 +1004,6 @@ rebuild_relation(Relation OldHeap, Relation index,
bool verbose,

if (concurrent)
{
- /*
- * Do not let other backends wait for our completion during their
- * setup of logical replication. Unlike logical replication publisher,
- * we will have XID assigned, so the other backends - whether
- * walsenders involved in logical replication or regular backends
- * executing also REPACK (CONCURRENTLY) - would have to wait for our
- * completion before they can build their initial snapshot. It is o.k.
- * for any decoding backend to ignore us because we do not change
- * tuple descriptor of any table, and the data changes we write should
- * not be decoded by other backends.
- */
- LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
- MyProc->statusFlags |= PROC_IN_CONCURRENT_REPACK;
- ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
- LWLockRelease(ProcArrayLock);
-
/*
* The worker needs to be member of the locking group we're the leader
* of. We ought to become the leader before the worker starts. The

Thoughts?

--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2026-04-01 15:07:14 Re: Add pg_stat_autovacuum_priority
Previous Message Nathan Bossart 2026-04-01 14:54:23 Re: DOCS - DROP SUBSCRIPTION does not document parameter "IF EXISTS"