Re: Adding REPACK [concurrently]

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>
Cc: alvherre(at)alvh(dot)no-ip(dot)org, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Adding REPACK [concurrently]
Date: 2026-02-25 19:41:15
Message-ID: 100248.1772048475@localhost
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> wrote:

> I did stress testing on v35 patches, where I did concurrency test using
> pgbench with 50 concurrent clients, 4 threads with the below pgbench
> script (dual_chaos.sql) on the following table setup(setup.sql).
> I ran pgbench with 5M rows for 10 minutes and 50M for ~45 minutes
> multiple times. REPACK (concurrently) ran successfully except "once"(see below).
> I created a shadow/clone table to use for checking the correctness after doing
> the concurrency test.I used 4 checks to verify that data is intact and
> REPACK (concurrently) ran successfully.
>
> 1) table file OID(relfilenode) swapped?
> 2) bloat gone? victim relation size should be less than
> shadow relation size.
> 3) using FULL JOIN logic (borrowed from repack.spec, with small change)
> against the shadow table which goes under the same concurrent ops
> done on the victim table , basically doing dual writes (see dual_chaos.sql) to
> verify table data integrity.
> 4) Physical Index Integrity (amcheck) (borrowed from Mihail's tests)

Thanks!

> The concurrency test failed once. I tried to reproduce the below scenario
> but no luck,i think the reason the assert failure happened because
> after speculative insert there might be no spec CONFIRM or ABORT, thoughts?

Perhaps, I'll try. I'm not sure the REPACK decoding worker does anthing
special regarding decoding. If you happen to see the problem again, please try
to preserve the related WAL segments - if this is a bug in PG executor,
pg_waldump might reveal that.

> Crash Test:
> i did crash test using debugger using a breakpoint inside apply_concurrent_changes
> to simulate a crash while concurrent changes are being done, after few concurrent changes
> are done , i crashed the server using "pg_ctl -m immediate stop", then restarted the server,
> i observed that REPACK (concurrently) didn't completed (expected), files were not swapped and data
> on the victim table is intact checked using FULL JOIN with shadow table, but there are
> some leftovers of the transient table we used for REPACK (concurrently) such as
> 1) transient table's relation files - these consume extra space , i think this was the
> case with VACUUM FULL previously, so these has to be removed manually , but
> I think this time we have a "leverage" which we can use to remove the extra space.
> 2) transient table's WALs - these are generated because of concurrent changes done while
> applying the logical decoded changes on the new transient table, i think this won't be a problem
> until they only will get recycled but if they get archived , they are of no use instead they
> consume more space and time during the archival process.

VACUUM FULL / CLUSTER also produces (a lot of) WAL, so IMO there's nothing
specific about REPACK.

Regarding the transient table, I have a draft patch (for future versions) that
creates the transient table in a separate transaction and commits it. (This is
part of the effort to not block the progress of VACUUM xmin horizon. The point
is that most of the time REPACK should not have XID assigned.) With this
design, each time REPACK starts, it checks (in the pg_depend catalog) if the
transient table exists for particular table, and if it does, it drops it.

> "Leverage" Idea:
> i think we can re-use these transient table's relation files and WALs during crash recovery,
> so that user don't have to re-run the REPACK (concurrently) after server has recovered,
> for this we might need to write a WAL for REPACK (concurrently) to let startup process
> know REPACK (concurrently) occurred which sets a flag, so at the end of startup process
> all the WALs of the transient table are already applied so transient table perfect now ,
> at the end we can do swapping (finish_heap_swap) after checking the flag , these are
> all my initial thoughts on this idea to reuse the "residue" files of the transient table.
> I could be totally wrong :) Please correct me if I am.

I think it'd be quite difficult to restart REPACK exactly at the point it
crashed. Especially if the tables are unlocked between server restart and the
restart of REPACK.

> i think we need to update this statement in repack.sgml regarding wal_level
> <listitem>
> <para>
> The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
> configuration parameter is less than <literal>logical</literal>.
> </para>
> </listitem>
> because of this commit POC: enable logical decoding when wal_level = 'replica' without a server restart (67c2097)

I'm aware of this commit and already updated regression tests, however forgot
to update the user documentation. Thanks for reminder.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-02-25 20:29:01 Re: More speedups for tuple deformation
Previous Message Andrew Dunstan 2026-02-25 19:17:25 CLI interface to AdjustUpgrade.pm