Re: Removing INNER JOINs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, Mart Kelder <mart(at)kelder31(dot)nl>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Removing INNER JOINs
Date: 2015-03-29 22:16:50
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> I'm not worried about the cost of the decision at plan init time. I was
> just confused about what Tom was recommending. I couldn't quite decide from
> his email if he meant to keep the alternative plan node around so that the
> executor must transition through it for each row, or to just choose the
> proper plan at executor init and return the actual root of the selected
> plan instead of returning the initialised AlternativePlan node (see
> nodeAlternativePlan.c)

TBH, I don't like anything about this patch and would prefer to reject it
altogether. I think it's a disaster from a system structural standpoint,
will probably induce nasty bugs, and simply doesn't apply to enough
real-world queries to be worth either the pain of making it work or the
planning overhead it will add.

The structural damage could be reduced if you got rid of the far-too-cute
idea that you could cut the AlternativePlan node out of the plan tree at
executor startup time. Just have it remember which decision it made and
pass down all the calls to the appropriate child node. What you're doing
here violates the rule that planstate trees have a one-to-one relationship
to plan trees. EXPLAIN used to iterate over those trees in lockstep, and
there probably still is code that does similar things (in third-party
modules if not core), so I don't think we should abandon that principle.

Also, the patch seems rather schizophrenic about whether it's relying
on run-time decisions or not; in particular I see no use of the
PlannedStmt.suitableFor field, and no reason to have one if there's
to be an AlternativePlan node making the decision.

I'm just about certain that you can't generate the two alternative plans
simply by calling make_one_rel() twice. The planner has far too much
tendency to scribble on its input data structures for that to work
reliably. It's possible you could dodge bugs of that sort, and save
some cycles, if you did base relation planning only once and created the
alternatives only while working at the join-relation level. Even then
I think you'd need to do pushups like what GEQO does in order to repeat
join-level planning. (While I'm looking at that: what used to be a
reasonably clear API between query_planner and grouping_planner now seems
like a complete mess, and I'm quite certain you've created multiple bugs
in grouping_planner, because it would need to track more than one e.g.
current_pathkeys value for the subplans created for the different
final_rels. You can't just arbitrarily do some of those steps in one loop
and the others in a totally different loop.)

As far as the fundamental-bugs issue goes, I have no faith that "there are
no pending AFTER trigger events" is a sufficient condition for "all known
foreign-key constraints hold against whatever-random-snapshot-I'm-using";
and it's certainly not a *necessary* condition. (BTW the patch should be
doing its best to localize knowledge about that, rather than spreading the
assumption far and wide through comments and choices of flag/variable
names.) I think the best you can hope to say in that line is that if
there were no pending events at the time the snapshot was taken, it might
be all right. Maybe. But it's not hard to imagine the trigger queue
getting emptied while snapshots still exist that can see inconsistent
states that prevailed before the triggers fired. (Remember that a trigger
might restore consistency by applying additional changes, not just by
throwing an error.) STABLE functions are the pain point here since they
could execute queries with snapshots from quite some time back.

As for the cost issue, that's an awful lot of code you added to
remove_useless_joins(). I'm concerned about how much time it's going to
spend while failing to prove anything for most queries. For starters,
it looks to be O(N^2) in the number of relations, which the existing
join_is_removable code is not; and in general it looks like it will do
work in many more common cases than the existing code has to. I'm also
pretty unhappy about having get_relation_info() physically visiting the
pg_constraint catalog for every relation that's ever scanned by any query.
(You could probably teach the relcache to remember data about foreign-key
constraints, instead of doing it this way.)

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-03-29 23:54:16 Re: getting rid of "thread fork emulation" in pgbench?
Previous Message Andres Freund 2015-03-29 20:13:08 Re: Relation extension scalability