Re: upper planner path-ification

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: upper planner path-ification
Date: 2015-05-14 02:27:46
Message-ID: 4187.1431570466@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I've been pulling over Tom's occasional remarks about redoing
> grouping_planner - and maybe further layers of the planner - to work
> with Paths instead of Plans. ...
> I think there are two separate problems here. First, there's the
> problem that grouping_planner() is complicated.
> Second, there's the problem that we might like to order aggregates
> with respect to joins.

Both of those are problems all right, but there is more context here.

* As some of the messages you cited mention, we would like to have Path
representations for things like aggregation, because that's the only
way we'll get to a sane API that lets FDWs propose remote aggregation.

* We have also had requests for the planner to be smarter about
UNION/INTERSECT/EXCEPT queries. Again, that requires cost comparisons,
which would be better done if we had Path representations for the various
ways we'd want to consider. Also, a big part of the issue there is
wanting to be able to consider sorted versus unsorted plans for the leaf
queries of the set-op (IOW, optionally pushing the sort requirements of
the set-op down into the leaves). Right now, such comparisons are
impossible because prepunion.c uses subquery_planner to handle the leaf
queries, and what it gets back from that is one finished plan, not
alternative Paths.

* Likewise, subqueries-in-FROM are handled by recursing to
subquery_planner, which gives us back just one frozen Plan for the
subquery. Among other things this seems to make it too expensive to
consider generating parameterized paths for the subquery. I'd like
to keep subquery plans in Path form until much later as well.

So these considerations motivate wishing that the result of
subquery_planner could be a list of alternative Paths rather than a Plan,
which means that every operation it knows how to tack onto the scan/join
plan has to be representable by a Path of some sort.

I don't know how granular that needs to be, though. For instance, one
could certainly imagine that it might be sufficient initially to have a
single "WindowPath" that represents "do all the window functions", and
then at create_plan time we'd generate multiple WindowAgg plan nodes in
the same ad-hoc way as now. Breaking that down in the Path representation
would only become interesting if it would affect higher-level decisions,
and I'm not immediately seeing how it might do that.

> I'm inclined to think that it would be useful to solve the first
> problem even if we didn't solve the second one right away (but that
> might be wrong). As a preparatory step, I'm thinking it would be
> sensible to split grouping_planner() into an outer function that would
> handle the addition of Limit and LockRows nodes and maybe planning of
> set operations, and an inner function that would handle GROUP BY,
> DISTINCT, and possibly window function planning.

For the reasons I mentioned, I'd like to get to a point where
subquery_planner's output is Paths not Plans as soon as possible. But the
idea of coarse representation of steps that we aren't trying to be smart
about might be useful to save some labor in the short run.

The zero-order version of that might be a single Path node type that
represents "do whatever grouping_planner would do", which we'd start to
break down into multiple node types once we had the other APIs fixed.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-05-14 03:43:57 Re: upper planner path-ification
Previous Message Robert Haas 2015-05-14 01:39:03 upper planner path-ification