Re: Performance issues with parallelism and LIMIT

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, dilipbalaut(at)gmail(dot)com
Subject: Re: Performance issues with parallelism and LIMIT
Date: 2025-11-18 19:37:40
Message-ID: 09ca028e-2b50-42af-baed-6d582252f359@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/18/25 19:35, David Geier wrote:
>
> On 18.11.2025 18:31, Tomas Vondra wrote:
>> On 11/18/25 17:51, Tom Lane wrote:
>>> David Geier <geidav(dot)pg(at)gmail(dot)com> writes:
>>>> On 18.11.2025 16:40, Tomas Vondra wrote:
>>>>> It'd need code in the parallel-aware scans, i.e. seqscan, bitmap, index.
>>>>> I don't think you'd need code in other plans, because all parallel plans
>>>>> have one "driving" table.
>>>
>>> You're assuming that the planner will insert Gather nodes at arbitrary
>>> places in the plan, which isn't true. If it does generate plans that
>>> are problematic from this standpoint, maybe the answer is "don't
>>> parallelize in exactly that way".
>>>
>>
>> I think David has a point that nodes that "buffer" tuples (like Sort or
>> HashAgg) would break the approach making this the responsibility of the
>> parallel-aware scan. I don't see anything particularly wrong with such
>> plans - plans with partial aggregation often look like that.
>>
>> Maybe this should be the responsibility of execProcnode.c, not the
>> various nodes?
>>
>
> I like that idea, even though it would still not work while a node is
> doing the crunching. That is after it has pulled all rows and before it
> can return the first row. During this time the node won't call
> ExecProcNode().
>

True. Perhaps we could provide a function nodes could call in suitable
places to check whether to end?

Actually, how does canceling queries with parallel workers work? Is that
done similarly to what your patch did?

> But that seems like an acceptable limitation. At least it keeps working
> above "buffer" nodes.
>
> I'll give this idea a try. Then we can contrast this approach with the
> approach in my initial patch.
>
>> It'd be nice to show this in EXPLAIN (that some of the workers were
>> terminated early, before processing all the data).
>
> Inspectability on that end seems useful. Maybe only with VERBOSE,
> similarly to the extended per-worker information.
>

Maybe, no opinion. But it probably needs to apply to all nodes in the
parallel worker, right? Or maybe it's even a per-worker detail.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Darafei Komяpa Praliaskouski 2025-11-18 19:49:02 Re: pg_utility ?
Previous Message Mahmoud Ayman 2025-11-18 19:33:47 Re: gen_guc_tables.pl: Validate required GUC fields before code generation