upper planner path-ification

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: upper planner path-ification
Date: 2015-05-14 01:39:03
Message-ID: CA+TgmoaqgzhUX7frzddpGhkqT3rcNx-f0dF+ZqzsEfW-ARXAww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've been pulling over Tom's occasional remarks about redoing
grouping_planner - and maybe further layers of the planner - to work
with Paths instead of Plans. I've had difficulty locating all of the
relevant threads. Here's everything I've found so far, which I'm
pretty sure is not everything:

http://www.postgresql.org/message-id/17400.1311716272@sss.pgh.pa.us
http://www.postgresql.org/message-id/2614.1375730493@sss.pgh.pa.us
http://www.postgresql.org/message-id/22721.1385048662@sss.pgh.pa.us
http://www.postgresql.org/message-id/BANLkTinDjjFHNOzESG2J2U4GOkqLu69Zqg@mail.gmail.com
http://www.postgresql.org/message-id/8479.1418420018@sss.pgh.pa.us

I think there are two separate problems here. First, there's the
problem that grouping_planner() is complicated. It's doing cost
comparisons, but all in ad-hoc fashion rather than using a consistent
mechanic the way add_path() does. Generally, we can plan an aggregate
using either (1) a hashed aggregate, (2) a sorted aggregate, or (3)
for min or max, an index scan that just grabs the highest or lowest
value in lieu of a full table scan. Instead of generating a plan for
each of these possibilities, we'd like to generate paths for each one,
and then pick one to turn into a plan. AIUI, the hope is that this
would simplify the cost calculations, and also make it easier to
inject other paths, such as a path where an FDW performs the
aggregation step on the remote server.

Second, there's the problem that we might like to order aggregates
with respect to joins. If we have something like SELECT DISTINCT ON
(foo.x) foo.x, foo.y, bar.y FROM foo, bar WHERE foo.x = bar.x, then
(a) if foo.x has many duplicates, it will be better to DISTINCT-ify
foo and then join to bar afterwards but (b) if foo.x = bar.x is highly
selective, it will be better to join to bar first and then
DISTINCT-ify the result. Currently, aggregation is always last;
that's not great. Hitoshi Harada's proposed strategy of essentially
figuring out where the aggregation steps can go and then re-planning
for each one is also not great, because each join problem will be a
subtree of the one we use for the aggregate-last strategy, and thus
we're wasting effort by planning the same subtrees multiple times.
Instead, we might imagine letting grouping planner fish out the best
paths for the joinrels that represent possible aggregation points,
generate aggregation paths for each of those, and then work out what
additional rels need to be joined afterwards. That sounds hard, but
not as hard as doing something sensible with what we have today.

I'm inclined to think that it would be useful to solve the first
problem even if we didn't solve the second one right away (but that
might be wrong). As a preparatory step, I'm thinking it would be
sensible to split grouping_planner() into an outer function that would
handle the addition of Limit and LockRows nodes and maybe planning of
set operations, and an inner function that would handle GROUP BY,
DISTINCT, and possibly window function planning.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-05-14 02:27:46 Re: upper planner path-ification
Previous Message Tom Lane 2015-05-14 01:19:54 Re: Manipulating complex types as non-contiguous structures in-memory