Re: Adding REPACK [concurrently]

From: Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Adding REPACK [concurrently]
Date: 2025-12-11 20:38:00
Message-ID: CADzfLwXp4c-MJx7yVDxAGNNxPbX4o9dqyivxavtHvmUsdXYqBQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Antonin!

On Tue, Dec 9, 2025 at 7:52 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Worker makes more sense to me - the initial implementation is in 0005.

Comments for 0005, so far:

---
> export_initial_snapshot

Hm, should we use ExportSnapshot instead? And ImportSnapshort to import it.

---
> get_initial_snapshot

Should we check if a worker is still alive while waiting? Also is
"process_concurrent_changes".

And AFAIU RegisterDynamicBackgroundWorker does not guarantee new
workers to be started (in case of some fork-related issues).

---
> Assert(res = SHM_MQ_DETACHED);

==

---
> /* Wait a bit before we retry reading WAL. */
> (void) WaitLatch(MyLatch,
> WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
> 1000L,
> WAIT_EVENT_REPACK_WORKER_MAIN);

Looks like we need ResetLatch(MyLatch); here.

---
> * - decoding_ctx - logical decoding context, to capture concurrent data

Need to be removed together with parameters.

---
> hpm_context = AllocSetContextCreate(TopMemoryContext,
> "ProcessParallelMessages",
> ALLOCSET_DEFAULT_SIZES);

"ProcessRepacklMessages"

---
> if (XLogRecPtrIsInvalid(lsn_upto))
> {
> SpinLockAcquire(&shared->mutex);
> lsn_upto = shared->lsn_upto;
> /* 'done' should be set at the same time as 'lsn_upto' */
> done = shared->done;
> SpinLockRelease(&shared->mutex);
>
> /* Check if the work happens to be complete. */
> continue;
> }

May be moved to the start of the loop to avoid duplication.

---
> SpinLockAcquire(&shared->mutex);
> valid = shared->sfs_valid;
> SpinLockRelease(&shared->mutex);

Better to remember last_exported here to avoid any races/misses.

---
> shared->lsn_upto = InvalidXLogRecPtr;

I think it is better to clear it once it is read (after removing duplication).

---
> bool done;

bool exit_after_lsn_upto?

---
> bool sfs_valid;

Do we really need it? I think it is better to leave only last_exported
and in process_concurrent_changes wait add argument
(last_processed_file) and wait for last_exported to become higher.

---
What if we reverse roles of leader-worker?

Leader gets a snapshot, transfers it to workers (multiple probably for
parallel scan) using already ready mechanics - workers are processing
the scan of the table in parallel. Leader decodes the WAL.

Also, workers may be assigned with a list of indexes they need to build.

Feels like it reuses more from current infrastructure and also needs
less different synchronization logic. But I'm not sure about the
indexes phase - maybe it is not so easy to do.

---
Also, should we add some kind of back pressure between building
indexes/new heap and num of WAL we have?
But probably it is out of scope of the patch.

---
To build N indexes we need to scan table N times. What is about
building multiple indexes during a single heap scan?

--
Just a gentle reminder about the XMIN_COMMITTED flag and WAL storm
after the switch.

Best regards,
Mikhail.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message surya poondla 2025-12-11 21:30:01 Re: [PATCH] psql: tab completion for ALTER ROLE ... IN DATABASE ...
Previous Message Nathan Bossart 2025-12-11 20:28:30 Re: enhance wraparound warnings