From: | "Scott Mead" <scott(at)meads(dot)us> |
---|---|
To: | "Laurenz Albe" <laurenz(dot)albe(at)cybertec(dot)at>, "Greg Sabino Mullane" <htamfids(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Disable parallel query by default |
Date: | 2025-05-20 20:58:28 |
Message-ID: | 947e64fd-3e1b-40f2-acf3-2b77a358512c@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, May 14, 2025, at 4:06 AM, Laurenz Albe wrote:
> On Tue, 2025-05-13 at 17:53 -0400, Scott Mead wrote:
> > On Tue, May 13, 2025, at 5:07 PM, Greg Sabino Mullane wrote:
> > > On Tue, May 13, 2025 at 4:37 PM Scott Mead <scott(at)meads(dot)us> wrote:
> > > > I'll open by proposing that we prevent the planner from automatically
> > > > selecting parallel plans by default
> > >
> > > That seems a pretty heavy hammer, when we have things like
> > > parallel_setup_cost that should be tweaked first.
> >
> > I agree it's a big hammer and I thought through parallel_setup_cost
> > quite a bit myself. The problem with parallel_setup_cost is that it
> > doesn't actually represent the overhead of a setting up parallel
> > query for a busy system. It does define the cost of setup for a
> > *single* parallel session, but it cannot accurately express the
> > cost of CPU and other overhead associated with the second, third,
> > fourth, etc... query that is executed as parallel. The expense to
> > the operating system is a function of the _rate_ of parallel query
> > executions being issued. Without new infrastructure, there's no way
> > to define something that will give me a true representation of the
> > cost of issuing a query with parallelism.
>
> There is no way for the optimizer to represent that your system is
> under CPU overload currently. But I agree with Greg that
> parallel_setup_cost is the setting that should be adjusted.
> If PostgreSQL is more reluctant to even start considering a parallel plan,
> that would be a move in the right direction in a case like this:
>
> > > > What is the fallout? When a high-volume, low-latency query flips to
> > > > parallel execution on a busy system, we end up in a situation where
> > > > the database is effectively DDOSing itself with a very high rate of
> > > > connection establish and tear-down requests. Even if the query ends
> > > > up being faster (it generally does not), the CPU requirements for the
> > > > same workload rapidly double or worse, with most of it being spent
> > > > in the OS (context switch, fork(), destroy()). When looking at the
> > > > database, you'll see a high load average, and high wait for CPU with
> > > > very little actual work being done within the database.
>
> You are painting a bleak picture indeed. I get to see PostgreSQL databases
> in trouble regularly, but I have not seen anything like what you describe.
> If a rather cheap, very frequent query is suddenly estimated to be
> expensive enough to warrant a parallel plan, I'd suspect that the estimates
> must be seriously off.
>
> With an argument like that, you may as well disable nested loop joins.
> I have seen enough cases where disabling nested loop joins, without any
> deeper analysis, made very slow queries reasonably fast.
My argument is that parallel query should not be allowed to be invoked without user intervention. Yes, nestedloop can have a similar impact, but let's take a look at the breakdown at scale of PQ:
1. pgbench -i -s 100
2. Make a query that will execute in parallel
SELECT aid, a.bid, bbalance
FROM pgbench_accounts a, pgbench_branches b
WHERE a.bid = b.bid
ORDER BY bbalance desc;
Non Parallel query = 4506.559 ms
Parallel query = 2849.073
Arguably, much better.
3. Use pgbench to execute these with a concurrency of 10, rate limit of 1 tps
pgbench -R 1 -r -T 120 -P 5 --no-vacuum -f pselect.sql -c 10
4. The parallel query was executing ~ 2.8 seconds in isolation, but when running with 10 concurrent sessions, breaks down to 5.8 seconds the non-parallel version executes on average of 5.5 seconds. You've completely erased the gains and only have a concurrency of 5 (that's with max_parallel_workers = 8). If you increase max_parallel_workers, this quickly becomes worse.
Even though parallel query is faster in isolation, even a small amount of concurrency has a quickly compounding effect the degrades very quickly (again, defaults with a 16 processor machine).
Concurrency - Non Parallel Runtime - Parallel Runtime
1 - 5003.951 - 3681.452
5 - 4936.565 - 4565.171
10 - 5573.239 - 5894.397
15 - 6224.292 - 8470.982
20 - 5632.948 - 13277.857
Even with max_parallel_workers protecting us with '8' (default), we erase our advantage by the time we go to concurrency of 5 clients.
Going back to the original commit which enabled PQ by default[1], it was done so that the feature would be tested during beta. I think it's time that we limit the accidental impact this can have to users by disabling the feature by default.
[1]-
https://github.com/postgres/postgres/commit/77cd477c4ba885cfa1ba67beaa82e06f2e182b85
"
Enable parallel query by default.
Change max_parallel_degree default from 0 to 2. It is possible that
this is not a good idea, or that we should go with 1 worker rather
than 2, but we won't find out without trying it. Along the way,
reword the documentation for max_parallel_degree a little bit to
hopefully make it more clear.
Discussion: 20160420174631(dot)3qjjhpwsvvx5bau5(at)alap3(dot)anarazel(dot)de
"
>
> Sure enough, I often see systems where I recommend disabling parallel
> query - in fact, whenever throughput is more important than response time.
> But I also see many cases where parallel query works just like it should
> and leads to a better user experience.
>
> I have come to disable JIT by default, but not parallel query.
>
> The primary problem that I encounter with parallel query is that dynamic
> shared memory segments grow to a size where they cause OOM errors.
> That's the most frequent reason for me to recommend disabling parallel query.
>
> Yours,
> Laurenz Albe
>
--
Scott Mead
Amazon Web Services
scott(at)meads(dot)us
From | Date | Subject | |
---|---|---|---|
Next Message | Bennie Swart | 2025-05-20 20:59:12 | Re: Join removal and attr_needed cleanup |
Previous Message | Nico Williams | 2025-05-20 20:43:55 | Re: Violation of principle that plan trees are read-only |