| From: | Sami Imseih <samimseih(at)gmail(dot)com> |
|---|---|
| To: | Lukas Fittl <lukas(at)fittl(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Unify parallel worker handling for index builds and instrumentation |
| Date: | 2026-06-01 18:31:00 |
| Message-ID: | CAA5RZ0t15-3LAoBdwY2VrUNVa3Sxf0PcPSiTQoak1v1wHd6Qag@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
> Whilst working on the stack-based instrumentation patch, I noticed
> that there is a lot of duplication of parallel worker code, and in
> particular parallel index build code, when it comes to how it sets up
> shared memory, and how metadata information is fed back to/from the
> leader.
+1, Thanks for the patches!
> My initial focus was around instrumentation, but I think there is a
> bigger duplication issue that needs a refactoring in regards to
> parallel index builds in particular.
Yes, looks like it.
I reviewed you proposed patch set and here are my findings.
> 0001: Use the new Instrumentation struct to handle WAL and Buffer usage together
This one is a straightforward refactor. Only comment is to remove the bare block
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
-
&wal_usage[ParallelWorkerNumber]);
+ {
+ Instrumentation *worker_instr;
+
+ worker_instr = shm_toc_lookup(toc,
PARALLEL_KEY_INSTRUMENTATION, false);
+ InstrEndParallelQuery(&worker_instr[ParallelWorkerNumber]);
+ }
> 0002: New helpers to estimate/store Instrumentation struct in shared memory
Another straightforward refactor.
This comment
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgWalUsage or
- * pgBufferUsage, so do it unconditionally.
- */
was removed from the several duplicate code paths that estimated space
for parallel instrumentation, but I don't think it added value anyhow,
so that is fine.
> 0003: New helpers to handle query text for parallel workers
+/*
+ * Restore the query text in a worker: set debug_query_string and report it as
+ * the current activity. The key is looked up with missing_ok, so this is a
+ * no-op (leaving debug_query_string NULL) when the leader stored no text.
+ */
+void
+RestoreParallelQueryText(shm_toc *toc, uint64 key)
+{
+ debug_query_string = shm_toc_lookup(toc, key, true);
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+}
I went back-forth on this since it combines 2 different ideas, setting
a debug_query_string and reporting activity. I think this is OK after
thinking about it a bit, and the whole point is to deduplicate code.
"no-op (leaving debug_query_string NULL) " The comment here is
not accurate. I would drop the "no-op" part since it is setting a NULL
query string. right?
> 0004: Introduces shared parallel index build helper functions
>
> Of these, 0004 is where I have the most uncertainty personally (what's
> a good abstraction here?), but wanted to share this to spark a
> conversation on how we could improve this.
I think this is a nice deduplication as well. So I am also +1 to this.
A few things:
1/
+ * Mutable state that is maintained for exact index AMs (e.g.
nbtree), and
+ * unused for lossy AMs.
Not sure if the "loss AMs" comment is accurate. From what I can tell, GIN
does not use the havedead and brokenhotchain fields and it's not lossy.
right?
2/
+ * workers are launched; the mutable fields are maintained by the workers
nit: extra space before "launched"
3/
Per the commit message:
This also fixes an oversight in parallel GIN index builds that forgot
to pass the queryid to the parallel workers, like other AMs do.
I think this should be an independent patch, and first in the series.
> I could imagine an alternate design where we refactor more generally
> how parallel workers get handled for any maintenance commands (i.e.
> also including parallel vacuum), but that would reduce how much code
> duplication we can avoid for index builds.
Yeah, I am not sure we need to go that far in the deduplication. Parallel
Vacuum and Parallel index builds are way different in what they do.
--
Sami Imseih
Amazon Web Services (AWS)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Álvaro Herrera | 2026-06-01 18:32:02 | Re: PostgreSQL 19 Beta 1 release announcement draft |
| Previous Message | Masahiko Sawada | 2026-06-01 16:44:28 | Re: Fix race during concurrent logical decoding activation |