Re: Performance issues with parallelism and LIMIT

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, dilipbalaut(at)gmail(dot)com
Subject: Re: Performance issues with parallelism and LIMIT
Date: 2023-11-03 20:48:25
Message-ID: 0f16bbea-1c1e-bd14-65f5-cac847c2b3b0@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/22/23 13:22, Tomas Vondra wrote:
> ...
>
>>> No opinion on these options, but worth a try. Alternatively, we could
>>> try the usual doubling approach - start with a low threshold (and set
>>> the latch frequently), and then gradually increase it up to the 1/4.
>>>
>>> That should work both for queries expecting only few rows and those
>>> producing a lot of data.
>>
>> I was thinking about this variant as well. One more alternative would be
>> latching the leader once a worker has produced 1/Nth of the LIMIT where
>> N is the number of workers. Both variants have the disadvantage that
>> there are still corner cases where the latch is set too late; but it
>> would for sure be much better than what we have today.
>>
>> I also did some profiling and - at least on my development laptop with 8
>> physical cores - the original example, motivating the batching change is
>> slower than when it's disabled by commenting out:
>>
>>     if (force_flush || mqh->mqh_send_pending > (mq->mq_ring_size >> 2))
>>
>> SET parallel_tuple_cost TO 0;
>> CREATE TABLE b (a int);
>> INSERT INTO b SELECT generate_series(1, 200000000);
>> ANALYZE b;
>> EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM b;
>>
>>  Gather  (cost=1000.00..1200284.61 rows=200375424 width=4) (actual
>> rows=200000000 loops=1)
>>    Workers Planned: 7
>>    Workers Launched: 7
>>    ->  Parallel Seq Scan on b  (cost=0.00..1199284.61 rows=28625061
>> width=4) (actual rows=25000000 loops=8)
>>
>> Always latch: 19055 ms
>> Batching:     19575 ms
>>
>> If I find some time, I'll play around a bit more and maybe propose a patch.
>>
>
> OK. Once you have a WIP patch maybe share it and I'll try to do some
> profiling too.
>
>>>> ...
>>>>
>>>> We would need something similar to CHECK_FOR_INTERRUPTS() which returns
>>>> a NULL slot if a parallel worker is supposed to stop execution (we could
>>>> e.g. check if the queue got detached). Or could we amend
>>>> CHECK_FOR_INTERRUPTS() to just stop the worker gracefully if the queue
>>>> got detached?
>>>>
>>> That sounds reasonable, but I'm not very familiar the leader-worker
>>> communication, so no opinion on how it should be done.
>>
>> I think an extra macro that needs to be called from dozens of places to
>> check if parallel execution is supposed to end is the least preferred
>> approach. I'll read up more on how CHECK_FOR_INTERRUPTS() works and if
>> we cannot actively signal the workers that they should stop.
>>
>
> IMHO if this requires adding another macro to a bunch of ad hoc places
> is rather inconvenient. It'd be much better to fix this in a localized
> manner (especially as it seems related to a fairly specific place).
>

David, do you still plan to try fixing these issues? I have a feeling
those issues may be fairly common but often undetected, or just brushed
of as "slow query" (AFAICS it was only caught thanks to comparing
timings before/after upgrade). Would be great to improve this.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-11-03 21:16:01 Re: meson documentation build open issues
Previous Message Jim Jones 2023-11-03 20:28:21 Re: [PATCH] Add XMLText function (SQL/XML X038)