Re: Support tid range scan in parallel?

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Cary Huang <cary(dot)huang(at)highgo(dot)ca>
Cc: Junwang Zhao <zhjwpku(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support tid range scan in parallel?
Date: 2025-11-07 05:31:20
Message-ID: CAApHDvoWXpDJU6R3A-Xv2MaEm4y233H_m0od6iqX+XGx3En2Vw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 30 Aug 2025 at 05:32, Cary Huang <cary(dot)huang(at)highgo(dot)ca> wrote:
>
> > The workers are ending their scan early because
> > heap_getnextslot_tidrange() returns false on the first call from the
> > parallel worker.

> Yes, the previous v9 patch missed setting node->trss_mintid and
> node->trss_maxtid, causing the parallel workers to exit early due to
> heap_getnextslot_tidrange() returning false.
>
> With the attached v10 patch, the parallel leader and workers now
> have to evaluate (TidRangeEval()) the given TID ranges and set them
> via ExecTidRangeScanInitializeDSM(),
> ExecTidRangeScanReInitializeDSM() and
> ExecTidRangeScanInitializeWorker().
>
> To prevent the leader and the workers from calling heap_setscanlimits()
> and trying to set phs_startblock and phs_numblock concurrently in
> shared memory, I added a condition to only allow parallel
> leader to set them. Since node->trss_mintid and node->trss_maxtid
> reside in local memory, the workers still have to call heap_set_tidrange
> () to have them set to return correct scan results:

I spent quite a bit of time looking at this. I didn't like the way
heap_setscanlimits() did:

+ /* set the limits in the ParallelBlockTableScanDesc, when present as leader */
+ if (scan->rs_base.rs_parallel != NULL && !IsParallelWorker())

as it wasn't clear to me that this didn't break completely when the
leader didn't make it there first.

I've made quite a few revisions to the v10 patch, which I've attached
as 11-0002. v11-0001 your v10 rebased atop of master.

Here's a summary of the changes:

1. Moved block limiting logic for parallel scans into
table_block_parallelscan_startblock_init(). There's currently a lock
here to ensure only 1 worker can set the shared memory fields at a
time. I've hooked into the same lock to set the startblock and
numblocks.
2. Fixed chunk size ramp-down code which is meant to divvy up the scan
into smaller and smaller chunks as it nears completion so that one
worker doesn't get left with too much work and leave the others with
nothing. That code still thought that it was scanning every block in
the table.
3. Changed things around so that the min/max TID for the parallel scan
is specified via table_rescan_tidrange(). This means zero changes to
TidRangeNext, and the only additions to nodeTidrangescan.c are for the
shared memory handling.
4. The rest of the changes are mostly cosmetic.

With this version table_block_parallelscan_startblock_init() has grown
2 extra fields. I considered we should instead rename this function
append "_with_limit" to its name then add another function with the
original name that calls the renamed function passing in
InvalidBlockNumber for both new fields. I didn't do that as we only
have a single call to the existing function, so doing that would only
be for the benefit of extensions that happen to use that function. It
doesn't seem overly difficult for them to adjust their code. I didn't
find any code using that function in codesearch.debian.net.

I still need to do a bit more testing on this, but in the meantime
thought I'd share what I've done with it so that other people can look
in parallel.

David

Attachment Content-Type Size
v11-0001-v10-parallel-tid-range-scan.patch application/octet-stream 25.9 KB
v11-0002-fixup-v10-parallel-tid-range-scan.patch application/octet-stream 23.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-11-07 05:34:51 Re: Assertion failure in SnapBuildInitialSnapshot()
Previous Message vignesh C 2025-11-07 05:28:19 Re: Logical Replication of sequences