Re: Parallel Seq Scan

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Marko Tiikkaja <marko(at)joh(dot)to>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2014-12-19 14:54:38
Message-ID: 54943C2E.6010401@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/19/2014 04:39 PM, Stephen Frost wrote:
> * Marko Tiikkaja (marko(at)joh(dot)to) wrote:
>> On 12/19/14 3:27 PM, Stephen Frost wrote:
>>> We'd have to coach our users to
>>> constantly be tweaking the enable_parallel_query (or whatever) option
>>> for the queries where it helps and turning it off for others. I'm not
>>> so excited about that.
>>
>> I'd be perfectly (that means 100%) happy if it just defaulted to
>> off, but I could turn it up to 11 whenever I needed it. I don't
>> believe to be the only one with this opinion, either.
>
> Perhaps we should reconsider our general position on hints then and
> add them so users can define the plan to be used.. For my part, I don't
> see this as all that much different.
>
> Consider if we were just adding HashJoin support today as an example.
> Would we be happy if we had to default to enable_hashjoin = off? Or if
> users had to do that regularly because our costing was horrid? It's bad
> enough that we have to resort to those tweaks today in rare cases.

This is somewhat different. Imagine that we achieve perfect
parallelization, so that when you set enable_parallel_query=8, every
query runs exactly 8x faster on an 8-core system, by using all eight cores.

Now, you might still want to turn parallelization off, or at least set
it to a lower setting, on an OLTP system. You might not want a single
query to hog all CPUs to run one query faster; you'd want to leave some
for other queries. In particular, if you run a mix of short
transactions, and some background-like tasks that run for minutes or
hours, you do not want to starve the short transactions by giving all
eight CPUs to the background task.

Admittedly, this is a rather crude knob to tune for such things,
but it's quite intuitive to a DBA: how many CPU cores is one query
allowed to utilize? And we don't really have anything better.

In real life, there's always some overhead to parallelization, so that
even if you can make one query run faster by doing it, you might hurt
overall throughput. To some extent, it's a latency vs. throughput
tradeoff, and it's quite reasonable to have a GUC for that because
people have different priorities.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-12-19 14:57:55 Re: [HACKERS] Re: Updated libpq5 packages cause connection errors on postgresql 9.2
Previous Message Robert Haas 2014-12-19 14:53:53 Re: Parallel Seq Scan