From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | David Geier <geidav(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, dilipbalaut(at)gmail(dot)com |
Subject: | Re: Performance issues with parallelism and LIMIT |
Date: | 2023-02-22 12:22:27 |
Message-ID: | c3ab3c44-5fb3-4f9c-e5ab-d9af536927a7@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2/20/23 19:18, David Geier wrote:
> Hi,
>
> On 2/8/23 11:42, Tomas Vondra wrote:
>> On 2/1/23 14:41, David Geier wrote:
>>
>> Yeah, this is a pretty annoying regression. We already can hit poor
>> behavior when matching rows are not distributed uniformly in the tables
>> (which is what LIMIT costing assumes), and this makes it more likely to
>> hit similar issues. A bit like when doing many HTTP requests makes it
>> more likely to hit at least one 99% outlier.
> Are you talking about the use of ordering vs filtering indexes in
> queries where there's both an ORDER BY and a filter present (e.g. using
> an ordering index but then all rows passing the filter are at the end of
> the table)? If not, can you elaborate a bit more on that and maybe give
> an example.
Yeah, roughly. I don't think the explicit ORDER BY is a requirement for
this to happen - it's enough when the part of the plan below LIMIT
produces many rows, but the matching rows are at the end.
>> No opinion on these options, but worth a try. Alternatively, we could
>> try the usual doubling approach - start with a low threshold (and set
>> the latch frequently), and then gradually increase it up to the 1/4.
>>
>> That should work both for queries expecting only few rows and those
>> producing a lot of data.
>
> I was thinking about this variant as well. One more alternative would be
> latching the leader once a worker has produced 1/Nth of the LIMIT where
> N is the number of workers. Both variants have the disadvantage that
> there are still corner cases where the latch is set too late; but it
> would for sure be much better than what we have today.
>
> I also did some profiling and - at least on my development laptop with 8
> physical cores - the original example, motivating the batching change is
> slower than when it's disabled by commenting out:
>
> if (force_flush || mqh->mqh_send_pending > (mq->mq_ring_size >> 2))
>
> SET parallel_tuple_cost TO 0;
> CREATE TABLE b (a int);
> INSERT INTO b SELECT generate_series(1, 200000000);
> ANALYZE b;
> EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM b;
>
> Gather (cost=1000.00..1200284.61 rows=200375424 width=4) (actual
> rows=200000000 loops=1)
> Workers Planned: 7
> Workers Launched: 7
> -> Parallel Seq Scan on b (cost=0.00..1199284.61 rows=28625061
> width=4) (actual rows=25000000 loops=8)
>
> Always latch: 19055 ms
> Batching: 19575 ms
>
> If I find some time, I'll play around a bit more and maybe propose a patch.
>
OK. Once you have a WIP patch maybe share it and I'll try to do some
profiling too.
>>> ...
>>>
>>> We would need something similar to CHECK_FOR_INTERRUPTS() which returns
>>> a NULL slot if a parallel worker is supposed to stop execution (we could
>>> e.g. check if the queue got detached). Or could we amend
>>> CHECK_FOR_INTERRUPTS() to just stop the worker gracefully if the queue
>>> got detached?
>>>
>> That sounds reasonable, but I'm not very familiar the leader-worker
>> communication, so no opinion on how it should be done.
>
> I think an extra macro that needs to be called from dozens of places to
> check if parallel execution is supposed to end is the least preferred
> approach. I'll read up more on how CHECK_FOR_INTERRUPTS() works and if
> we cannot actively signal the workers that they should stop.
>
IMHO if this requires adding another macro to a bunch of ad hoc places
is rather inconvenient. It'd be much better to fix this in a localized
manner (especially as it seems related to a fairly specific place).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2023-02-22 12:26:40 | Re: Experiments with Postgres and SSL |
Previous Message | Tomas Vondra | 2023-02-22 12:15:07 | Re: Ignoring BRIN for HOT updates (was: -udpates seems broken) |