Re: WIP: Upper planner pathification

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Upper planner pathification
Date: 2016-03-07 17:29:30
Message-ID: CA+TgmoaphpZUX5=UoeV=09_mR6i2RgvC=Bn2z=VxpWVy+o_Jeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 7, 2016 at 11:09 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> The currently-committed code generates paths where nested loops and
>> hash joins get pushed beneath the Gather node, but does not generate
>> paths where merge joins have been pushed beneath the Gather node. And
>> the reason I didn't try to generate those paths is because I believe
>> they will almost always suck.
>
> That's a perfectly reasonable engineering judgment (and especially so
> for a first release). What I'd really like to see documented is how
> that conclusion is related, or not, to the rules about how path nodes
> should be decorated with parallel_safe, parallel_degree, etc annotations.
> The existing documentation is barely adequate to explain what those fields
> mean for primitive scan nodes; it's impossible for anyone but you to
> know what they are supposed to mean for joins and higher-level nodes.

It is unrelated, I think.

If a path is parallel_safe, that is supposed to mean that, in theory,
the plan generated from that path could be executed within a worker
without crashing the server, giving wrong answers, or otherwise
destroying the world. However, as an optimization, if we've already
decided that the query can't ever be parallelized at all, for example
because it contains write operations, we don't bother trying to set
the parallel_safe flags correctly; they're just all false. Generally,
a path is definitely not parallel_safe if it contains a path that is
not parallel_safe; if all of the paths under it are parallel_safe,
then it is also parallel_safe except when there's some unsafe
computation added at the new level -- like an unsafe join qual between
two safe relations.

If a path is parallel_aware, that means that the plan generated by
that path wants to do something different when run in parallel mode.
Presumably, the difference will be that the plan will establish some
shared state in the dynamic shared memory segment created to service
that parallel query. For example, a sequential scan can be
parallel_aware, which will allow that sequential scan to be
simultaneously executed in multiple processes and return only a subset
of the rows in each. A non-parallel_aware sequential scan can still
be used in parallel mode; for example, consider this:

Gather
-> Hash Join
-> Parallel Seq Scan
-> Hash
-> Seq Scan

The outer seq scan needs to return each row only once across all
workers, but the inner seq scan needs to return every row in every
worker. Therefore, the outer seq scan is flagged parallel_aware and
displays in the EXPLAIN output as "Parallel Seq Scan", while the inner
one is not and does not.

parallel_degree is a horrible kludge whose function is to communicate
to the Gather node the number of workers for which it should budget.
Currently, every parallel plan's leftmost descendent will be a
Parallel Seq Scan, and that Parallel Seq Scan will estimate the degree
of parallelism that makes sense using a simplistic, bone-headed
algorithm based on the size of the table. That then bubbles up the
plan tree to the Gather node, which adopts the Parallel Seq Scan's
suggestion. I really hope this is going to go away eventually and be
replaced by something better. Really, I think we should try to figure
out the amount of parallelizable work (CPU, and effective I/O
parallelism) that is going to be required per leftmost tuple and
compare that to the amount of non-parallelizable work (presumably, the
reset of the I/O cost) and use that to judge the optimal parallel
degree. But I think that's going to take a lot of work to get right,
and it ties into some other issues, like the fact that we estimate a
scan of a 1MB table to have the same cost per page as a scan of a 10TB
table even though the former should probably be assumed to be fully
cached and the latter should probably be assumed not to be cached at
all. I think a lot more thought is needed here than I've given it
thus far, and one of the things that I'm hoping is that people will
test parallel query and actually report the results so that we can
accumulate some data on which problems are most important to go fix
and, also, what the shape of those fixes might look like.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-07 17:34:15 Re: psql completion for ids in multibyte string
Previous Message Jeff Janes 2016-03-07 17:02:42 Re: More stable query plans via more predictable column statistics