Re: Adding REPACK [concurrently]

From: Antonin Houska <ah(at)cybertec(dot)at>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Adding REPACK [concurrently]
Date: 2026-04-10 13:21:45
Message-ID: 125085.1775827305@localhost
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:

> When testing REPACK concurrently, I noticed that all WALs are retained from
> the moment REPACK begins copying data to the new table until the command
> finishes replaying concurrent changes on the new table and stops the repack
> decoding worker.
>
> I understand the reason: the REPACK command itself starts a long-running
> transaction, and logical decoding does not advance restart_lsn beyond the
> oldest running transaction's start position. As a result, slot.restart_lsn
> remains unchanged, preventing the checkpointer from recycling WALs.

I think you're right, sorry for the omission.

> However, since REPACK can run for a long time (hours or even days), I'd like
> to confirm whether this is expected behavior or if we plan to improve it
> in the future ? And additionally,

Yes, it will be improved. I have a draft patch for it, will rebase and post it
soon. The plan is to:

1) preserve the original xmin/xmax of the tuples when we insert them into the
new heap. Thus, besides achieving MVCC safety, we won't need XID assigned for
most of the time.

2) do catalog changes in separate transactions - XID needed here, but these
transactions take very short time.

3) use a single snapshot only for limited number of tuples/pages. When more
data needs to be copied, a new snapshot is built, supposedly with higher
->xmin than the prevous one.

> IIUC, REPACK without using concurrent option does not have this issue.

It does not have the WAL recycling issue because it does not need to read
WAL. However it also runs in a long transaction. Even though it does not need
XID for the actual heap rewriting, it gets one at the moment it locks the
table using AccessExclusiveLock (which is at the very beginning).

> Given that we do not restart a REPACK, I think the repack decoding worker
> should be able to advance restart_lsn each time after writing changes
> (similar to how a physical slot behaves). To illustrate this, I've written
> a patch (attached) that implements this approach, and it works fine for me.

LGTM, thanks!

> BTW, catalog_xmin also won't advance, but that seems not a big issue as
> the REPACK transaction itself also holds a snapshot that retains catalog tuples,
> so advancing catalog_xmin wouldn't change the situation anyway.

The snapshot "resetting" (mentioned above) should fix this problem too.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2026-04-10 13:27:47 Re: Heads Up: cirrus-ci is shutting down June 1st
Previous Message Robert Treat 2026-04-10 13:10:18 Re: Add missing period to HINT messages