| From: | Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com> |
|---|---|
| To: | Antonin Houska <ah(at)cybertec(dot)at> |
| Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net> |
| Subject: | Re: Adding REPACK [concurrently] |
| Date: | 2025-12-11 20:38:00 |
| Message-ID: | CADzfLwXp4c-MJx7yVDxAGNNxPbX4o9dqyivxavtHvmUsdXYqBQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello, Antonin!
On Tue, Dec 9, 2025 at 7:52 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Worker makes more sense to me - the initial implementation is in 0005.
Comments for 0005, so far:
---
> export_initial_snapshot
Hm, should we use ExportSnapshot instead? And ImportSnapshort to import it.
---
> get_initial_snapshot
Should we check if a worker is still alive while waiting? Also is
"process_concurrent_changes".
And AFAIU RegisterDynamicBackgroundWorker does not guarantee new
workers to be started (in case of some fork-related issues).
---
> Assert(res = SHM_MQ_DETACHED);
==
---
> /* Wait a bit before we retry reading WAL. */
> (void) WaitLatch(MyLatch,
> WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
> 1000L,
> WAIT_EVENT_REPACK_WORKER_MAIN);
Looks like we need ResetLatch(MyLatch); here.
---
> * - decoding_ctx - logical decoding context, to capture concurrent data
Need to be removed together with parameters.
---
> hpm_context = AllocSetContextCreate(TopMemoryContext,
> "ProcessParallelMessages",
> ALLOCSET_DEFAULT_SIZES);
"ProcessRepacklMessages"
---
> if (XLogRecPtrIsInvalid(lsn_upto))
> {
> SpinLockAcquire(&shared->mutex);
> lsn_upto = shared->lsn_upto;
> /* 'done' should be set at the same time as 'lsn_upto' */
> done = shared->done;
> SpinLockRelease(&shared->mutex);
>
> /* Check if the work happens to be complete. */
> continue;
> }
May be moved to the start of the loop to avoid duplication.
---
> SpinLockAcquire(&shared->mutex);
> valid = shared->sfs_valid;
> SpinLockRelease(&shared->mutex);
Better to remember last_exported here to avoid any races/misses.
---
> shared->lsn_upto = InvalidXLogRecPtr;
I think it is better to clear it once it is read (after removing duplication).
---
> bool done;
bool exit_after_lsn_upto?
---
> bool sfs_valid;
Do we really need it? I think it is better to leave only last_exported
and in process_concurrent_changes wait add argument
(last_processed_file) and wait for last_exported to become higher.
---
What if we reverse roles of leader-worker?
Leader gets a snapshot, transfers it to workers (multiple probably for
parallel scan) using already ready mechanics - workers are processing
the scan of the table in parallel. Leader decodes the WAL.
Also, workers may be assigned with a list of indexes they need to build.
Feels like it reuses more from current infrastructure and also needs
less different synchronization logic. But I'm not sure about the
indexes phase - maybe it is not so easy to do.
---
Also, should we add some kind of back pressure between building
indexes/new heap and num of WAL we have?
But probably it is out of scope of the patch.
---
To build N indexes we need to scan table N times. What is about
building multiple indexes during a single heap scan?
--
Just a gentle reminder about the XMIN_COMMITTED flag and WAL storm
after the switch.
Best regards,
Mikhail.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | surya poondla | 2025-12-11 21:30:01 | Re: [PATCH] psql: tab completion for ALTER ROLE ... IN DATABASE ... |
| Previous Message | Nathan Bossart | 2025-12-11 20:28:30 | Re: enhance wraparound warnings |