Re: Disable parallel query by default

From: "Scott Mead" <scott(at)meads(dot)us>
To: "Laurenz Albe" <laurenz(dot)albe(at)cybertec(dot)at>, "Greg Sabino Mullane" <htamfids(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Disable parallel query by default
Date: 2025-05-20 20:58:28
Message-ID: 947e64fd-3e1b-40f2-acf3-2b77a358512c@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Wed, May 14, 2025, at 4:06 AM, Laurenz Albe wrote:
> On Tue, 2025-05-13 at 17:53 -0400, Scott Mead wrote:
> > On Tue, May 13, 2025, at 5:07 PM, Greg Sabino Mullane wrote:
> > > On Tue, May 13, 2025 at 4:37 PM Scott Mead <scott(at)meads(dot)us> wrote:
> > > > I'll open by proposing that we prevent the planner from automatically
> > > > selecting parallel plans by default
> > >
> > > That seems a pretty heavy hammer, when we have things like
> > > parallel_setup_cost that should be tweaked first.
> >
> > I agree it's a big hammer and I thought through parallel_setup_cost
> > quite a bit myself. The problem with parallel_setup_cost is that it
> > doesn't actually represent the overhead of a setting up parallel
> > query for a busy system. It does define the cost of setup for a
> > *single* parallel session, but it cannot accurately express the
> > cost of CPU and other overhead associated with the second, third,
> > fourth, etc... query that is executed as parallel. The expense to
> > the operating system is a function of the _rate_ of parallel query
> > executions being issued. Without new infrastructure, there's no way
> > to define something that will give me a true representation of the
> > cost of issuing a query with parallelism.
>
> There is no way for the optimizer to represent that your system is
> under CPU overload currently. But I agree with Greg that
> parallel_setup_cost is the setting that should be adjusted.
> If PostgreSQL is more reluctant to even start considering a parallel plan,
> that would be a move in the right direction in a case like this:
>
> > > > What is the fallout? When a high-volume, low-latency query flips to
> > > > parallel execution on a busy system, we end up in a situation where
> > > > the database is effectively DDOSing itself with a very high rate of
> > > > connection establish and tear-down requests. Even if the query ends
> > > > up being faster (it generally does not), the CPU requirements for the
> > > > same workload rapidly double or worse, with most of it being spent
> > > > in the OS (context switch, fork(), destroy()). When looking at the
> > > > database, you'll see a high load average, and high wait for CPU with
> > > > very little actual work being done within the database.
>
> You are painting a bleak picture indeed. I get to see PostgreSQL databases
> in trouble regularly, but I have not seen anything like what you describe.
> If a rather cheap, very frequent query is suddenly estimated to be
> expensive enough to warrant a parallel plan, I'd suspect that the estimates
> must be seriously off.
>
> With an argument like that, you may as well disable nested loop joins.
> I have seen enough cases where disabling nested loop joins, without any
> deeper analysis, made very slow queries reasonably fast.

My argument is that parallel query should not be allowed to be invoked without user intervention. Yes, nestedloop can have a similar impact, but let's take a look at the breakdown at scale of PQ:

1. pgbench -i -s 100

2. Make a query that will execute in parallel

SELECT aid, a.bid, bbalance
FROM pgbench_accounts a, pgbench_branches b
WHERE a.bid = b.bid
ORDER BY bbalance desc;

Non Parallel query = 4506.559 ms
Parallel query = 2849.073

Arguably, much better.

3. Use pgbench to execute these with a concurrency of 10, rate limit of 1 tps

pgbench -R 1 -r -T 120 -P 5 --no-vacuum -f pselect.sql -c 10

4. The parallel query was executing ~ 2.8 seconds in isolation, but when running with 10 concurrent sessions, breaks down to 5.8 seconds the non-parallel version executes on average of 5.5 seconds. You've completely erased the gains and only have a concurrency of 5 (that's with max_parallel_workers = 8). If you increase max_parallel_workers, this quickly becomes worse.

Even though parallel query is faster in isolation, even a small amount of concurrency has a quickly compounding effect the degrades very quickly (again, defaults with a 16 processor machine).

Concurrency - Non Parallel Runtime - Parallel Runtime
1 - 5003.951 - 3681.452
5 - 4936.565 - 4565.171
10 - 5573.239 - 5894.397
15 - 6224.292 - 8470.982
20 - 5632.948 - 13277.857

Even with max_parallel_workers protecting us with '8' (default), we erase our advantage by the time we go to concurrency of 5 clients.

Going back to the original commit which enabled PQ by default[1], it was done so that the feature would be tested during beta. I think it's time that we limit the accidental impact this can have to users by disabling the feature by default.

[1]-
https://github.com/postgres/postgres/commit/77cd477c4ba885cfa1ba67beaa82e06f2e182b85

"
Enable parallel query by default.
Change max_parallel_degree default from 0 to 2. It is possible that
this is not a good idea, or that we should go with 1 worker rather
than 2, but we won't find out without trying it. Along the way,
reword the documentation for max_parallel_degree a little bit to
hopefully make it more clear.

Discussion: 20160420174631(dot)3qjjhpwsvvx5bau5(at)alap3(dot)anarazel(dot)de
"

>
> Sure enough, I often see systems where I recommend disabling parallel
> query - in fact, whenever throughput is more important than response time.
> But I also see many cases where parallel query works just like it should
> and leads to a better user experience.
>
> I have come to disable JIT by default, but not parallel query.
>
> The primary problem that I encounter with parallel query is that dynamic
> shared memory segments grow to a size where they cause OOM errors.
> That's the most frequent reason for me to recommend disabling parallel query.
>
> Yours,
> Laurenz Albe
>

--
Scott Mead
Amazon Web Services
scott(at)meads(dot)us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bennie Swart 2025-05-20 20:59:12 Re: Join removal and attr_needed cleanup
Previous Message Nico Williams 2025-05-20 20:43:55 Re: Violation of principle that plan trees are read-only