| From: | David Geier <geidav(dot)pg(at)gmail(dot)com> |
|---|---|
| To: | Tomas Vondra <tomas(at)vondra(dot)me>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, dilipbalaut(at)gmail(dot)com |
| Subject: | Re: Performance issues with parallelism and LIMIT |
| Date: | 2026-01-14 16:02:35 |
| Message-ID: | 62443352-aacf-4c85-82f0-8d979cc190bb@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Tomas!
On 26.11.2025 18:30, Tomas Vondra wrote:
> The question is if this are the only two such issues possible, and I'm
> afraid the answer is "no" :-(
> The question is if "exiting" from any place calling CFI leaves the
> execution state in a valid state. Valid enough so that we can call
> ExecEndNode() on all the nodes, including the one from which we exited.
> But I don't think we can rely on that. The node can do multiple steps,
> interleaved with CFI, not expecting that only one of them happens. I
> assume this would cause a lot of issues ...
Yes, seems like this is the wrong approach. Even though it has the
appeal of being able to rely on all the existing calls to CFI.
Not using CFI means we need to add checks to all places we care about. I
would start with all base relation scans and sort and take it from there.
>> The new patch accounts for that by introducing a new TableScanDescData
>> flag SO_OBEY_PARALLEL_WORKER_STOP, which indicates if the scan should
>> adhere to the parallel worker stop or not.
>>
>> Stopping is broadcasted to all parallel workers via SendProcSignal().
>> The stop variable is checked with the new
>> CHECK_FOR_PARALLEL_WORKER_STOP() macro.
Turned out that this way background workers can miss signals, if they
start up after the leader has already found enough rows and hence
broadcasted the stop signal.
I've fixed that by instead storing a stop flag in shared memory which
the workers can check to see if they're supposed to stop. The flag must
be propagated into the various places that need to check it, e.g. into
the HEAP or index scan code.
>> In the PoC implementation I've for now only changed nodeSeqScan.c. If
>> there's agreement to pursue this approach, I'll change the other places
>> as well. Naming can also likely be still improved.
>>
>
> This assumes we need to "exit" only from a heapam scan. That's true for
> the example, but is that enough in general? What if the worker already
> finished it's plan, and is now busy doing something else expensive?
> Could be a big sort, aggregation, ... Can we do something about these
> cases too?
This is only to illustrate the overall approach. Other places can easily
be made work in the spirit of nodeSeqscan.c and heapam.c, by passing the
stop flag into the looping code and bailing if it turns true.
But I would first like to agree on the overall approach before I spend
more time changing the other places.
--
David Geier
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-Parallel-workers-stop-earlier-in-presence-of-LIMI.patch | text/x-patch | 9.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Peter Eisentraut | 2026-01-14 15:59:55 | Re: Make copyObject work in C++ |