Re: Parallel Seq Scan

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-04-04 09:19:24
Message-ID: CAApHDvp2STf0=pQfpq+e7WA4QdYmpFM5qu_YtUpE7R0jLnH82Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

So I've just finished reading the impressive 244 emails (so far) about
Parallel Seq scan, and I've had a quick skim over the latest patch.

Its quite exciting to think that one day we'll have parallel query in
PostgreSQL, but I have to say, that I think that there's a major point
about the proposed implementation that seems to have gotten forgotten
about, which I can't help but think won't get that far off the ground
unless more thought goes into it.

On 11 February 2015 at 09:56, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

> I think we're getting to the point where having a unique mapping from
> the plan to the execution tree is proving to be rather limiting
> anyway. Check for example discussion about join removal. But even for
> current code, showing only the custom plans for the first five EXPLAIN
> EXECUTEs is pretty nasty (Try explain that to somebody that doesn't know
> pg internals. Their looks are worth gold and can kill you at the same
> time) and should be done differently.
>
>
Going over the previous emails in this thread I see that it has been a long
time since anyone discussed anything around how we might decide at planning
time how many workers should be used for the query, and from the emails I
don't recall anyone proposing a good idea about how this might be done, and
I for one can't see how this is at all possible to do at planning time.

I think that the planner should know nothing of parallel query at all, and
the planner quite possibly should go completely unmodified for this patch.
One major problem I can see is that, given a query such as:

SELECT * FROM million_row_product_table WHERE category = 'ELECTRONICS';

Where we have a non-unique index on category, some plans which may be
considered might be:

1. Index scan on the category index to get all rows matching 'ELECTRONICS'
2. Sequence scan on the table, filter matching rows.
3. Parallel plan which performs a series of partial sequence scans pulling
out all matching rows.

I really think that if we end choosing things like plan 3, when plan 2 was
thrown out because of its cost, then we'll end up consuming more CPU and
I/O than we can possibly justify using. The environmentalist in me screams
that this is wrong. What if we kicked off 128 worker process on some
high-end hardware to do this? I certainly wouldn't want to pay the power
bill. I understand there's costing built in to perhaps stop this, but I
still think it's wrong headed, and we need to still choose the fastest
non-parallel plan and only consider parallelising that later.

Instead what I think should happen is:

The following link has been seen before on this thread, but I'll post it
again:
http://docs.oracle.com/cd/A57673_01/DOC/server/doc/A48506/pqoconce.htm

There's one key sentence in there that should not be ignored:

"It is important to note that the query is parallelized dynamically at
execution time."

"dynamically at execution time"... I think this also needs to happen in
PostgreSQL. If we attempt to do this parallel stuff at plan time, and we
happen to plan at some quiet period, or perhaps worse, some application's
start-up process happens to PREPARE a load of queries when the database is
nice and quite, then quite possibly we'll end up with some highly parallel
queries. Then perhaps come the time these queries are actually executed the
server is very busy... Things will fall apart quite quickly due to the
masses of IPC and context switches that would be going on.

I completely understand that this parallel query stuff is all quite new to
us all and we're likely still trying to nail down the correct
infrastructure for it to work well, so this is why I'm proposing that the
planner should know nothing of parallel query, instead I think it should
work more along the lines of:

* Planner should be completely oblivious to what parallel query is.
* Before executor startup the plan is passed to a function which decides if
we should parallelise it, and does so if the plan meets the correct
requirements. This should likely have a very fast exit path such as:

if root node's cost < parallel_query_cost_threshold
return; /* the query is not expensive enough to attempt to make parallel
*/

The above check will allow us to have an almost zero overhead for small low
cost queries.

This function would likely also have some sort of logic in order to
determine if the server has enough spare resource at the current point in
time to allow queries to be parallelised (Likely this is not too important
to nail this down for a first implementation).

* The plan should then be completely traversed node by node to determine
which nodes can be made parallel. This would likely require an interface
function to each node which returns true or false, depending on if it's
safe to parallelise. For seq scan this could be a simple test to see if
we're scanning a temp table.

* Before any changes are made to the plan, a complete copy of it should be
made.

* Funnel nodes could then be injected below the last node in each branch
which supports parallelism. If more than one branch exists with parallel
enabled nodes, then it should be up to this function to determine, based on
cost, which nodes will benefit the most from the additional workers.
Certain other node types would need something else below the Funnel node,
e.g Partial aggregation would need a new node below the Funnel to complete
the aggregation.

* The first parallel enabled nodes should be passed off to the worker
processes for execution.

So I quite strongly agree with Andres' comment above that we really need to
move away from this 1:1 assumption about the relationship between plan
nodes and executor nodes. Tom did mention some possible reasons here in his
response to my INNER JOIN removals patch ->
http://www.postgresql.org/message-id/32139.1427667410@sss.pgh.pa.us

Tom wrote:
"What you're doing here violates the rule that planstate trees have a
one-to-one relationship to plan trees. EXPLAIN used to iterate over those
trees in lockstep, and there probably still is code that does similar
things (in third-party modules if not core), so I don't think we should
abandon that principle."

So perhaps this needs analysis. If it's not possible, then perhaps the
parallel nodes could be inserted at the end of planning, providing the
executor could be coded in such a way that the parallel plan can still work
with 0 worker processes. Unfortunately it seems that transitions through
nodes that don't do anything is not free, so with this method there would
be a slowdown of parallel enabled plans when they're executed without any
worker processes.

Also here ->
https://technet.microsoft.com/en-us/library/ms178065%28v=sql.105%29.aspx
There's some text that says:

"The SQL Server query optimizer does not use a parallel execution plan for
a query if any one of the following conditions is true:"
"* A serial execution plan is considered faster than any possible parallel
execution plan for the particular query."

I'm finding it a bit hard to get a true meaning from that, but if I'm not
mistaken it means that the serial plan will be preferred over a parallel
plan, as if the parallel plan does not get allocated any workers at
execution time, then we don't want to be left with a slow plan...

Apologies if any of this has been discussed any already designed around, I
just didn't see anything in the emails to indicate that it has.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-04-04 12:57:12 Re: TABLESAMPLE patch
Previous Message Andrew Gierth 2015-04-04 07:15:57 Re: Abbreviated keys for Numeric