Re: Unify parallel worker handling for index builds and instrumentation

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Lukas Fittl <lukas(at)fittl(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Unify parallel worker handling for index builds and instrumentation
Date: 2026-06-01 18:31:00
Message-ID: CAA5RZ0t15-3LAoBdwY2VrUNVa3Sxf0PcPSiTQoak1v1wHd6Qag@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> Whilst working on the stack-based instrumentation patch, I noticed
> that there is a lot of duplication of parallel worker code, and in
> particular parallel index build code, when it comes to how it sets up
> shared memory, and how metadata information is fed back to/from the
> leader.

+1, Thanks for the patches!

> My initial focus was around instrumentation, but I think there is a
> bigger duplication issue that needs a refactoring in regards to
> parallel index builds in particular.

Yes, looks like it.

I reviewed you proposed patch set and here are my findings.

> 0001: Use the new Instrumentation struct to handle WAL and Buffer usage together

This one is a straightforward refactor. Only comment is to remove the bare block

- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
-
&wal_usage[ParallelWorkerNumber]);
+ {
+ Instrumentation *worker_instr;
+
+ worker_instr = shm_toc_lookup(toc,
PARALLEL_KEY_INSTRUMENTATION, false);
+ InstrEndParallelQuery(&worker_instr[ParallelWorkerNumber]);
+ }

> 0002: New helpers to estimate/store Instrumentation struct in shared memory

Another straightforward refactor.

This comment

- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgWalUsage or
- * pgBufferUsage, so do it unconditionally.
- */

was removed from the several duplicate code paths that estimated space
for parallel instrumentation, but I don't think it added value anyhow,
so that is fine.

> 0003: New helpers to handle query text for parallel workers

+/*
+ * Restore the query text in a worker: set debug_query_string and report it as
+ * the current activity. The key is looked up with missing_ok, so this is a
+ * no-op (leaving debug_query_string NULL) when the leader stored no text.
+ */
+void
+RestoreParallelQueryText(shm_toc *toc, uint64 key)
+{
+ debug_query_string = shm_toc_lookup(toc, key, true);
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+}

I went back-forth on this since it combines 2 different ideas, setting
a debug_query_string and reporting activity. I think this is OK after
thinking about it a bit, and the whole point is to deduplicate code.

"no-op (leaving debug_query_string NULL) " The comment here is
not accurate. I would drop the "no-op" part since it is setting a NULL
query string. right?

> 0004: Introduces shared parallel index build helper functions
>
> Of these, 0004 is where I have the most uncertainty personally (what's
> a good abstraction here?), but wanted to share this to spark a
> conversation on how we could improve this.

I think this is a nice deduplication as well. So I am also +1 to this.

A few things:

1/
+ * Mutable state that is maintained for exact index AMs (e.g.
nbtree), and
+ * unused for lossy AMs.

Not sure if the "loss AMs" comment is accurate. From what I can tell, GIN
does not use the havedead and brokenhotchain fields and it's not lossy.
right?

2/
+ * workers are launched; the mutable fields are maintained by the workers

nit: extra space before "launched"

3/
Per the commit message:

This also fixes an oversight in parallel GIN index builds that forgot
to pass the queryid to the parallel workers, like other AMs do.

I think this should be an independent patch, and first in the series.

> I could imagine an alternate design where we refactor more generally
> how parallel workers get handled for any maintenance commands (i.e.
> also including parallel vacuum), but that would reduce how much code
> duplication we can avoid for index builds.

Yeah, I am not sure we need to go that far in the deduplication. Parallel
Vacuum and Parallel index builds are way different in what they do.

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2026-06-01 18:32:02 Re: PostgreSQL 19 Beta 1 release announcement draft
Previous Message Masahiko Sawada 2026-06-01 16:44:28 Re: Fix race during concurrent logical decoding activation