Re: join removal

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: join removal
Date: 2009-07-20 03:58:34
Message-ID: 603c8f070907192058l4ed8a216x97b115f73c59ddd7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 19, 2009 at 10:56 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jul 16, 2009 at 9:02 PM, Greg Stark<stark(at)mit(dot)edu> wrote:
>>> I have one big worry though. Currently you're detecting the unique
>>> property using the planner's path mechanism. I suppose that works, but
>>> it's only an accident of the planner design that the path for the
>>> unique index will always be there if it's the join condition. My
>>> instinct is that this code should be going back to the raw index info
>>> to prove this property.
>
>> I had trouble figuring out where to hook in the logic.  In an ideal
>> world, it would be nice to detect that the join is removable earlier,
>> but it's hard to do that, because it's not until we know the join
>> order that we can test whether any attributes from the inner rel are
>> used above the level of the join.
>
> Yeah.  Ideally this sort of thing would happen in prepjointree.c, but
> we don't have nearly enough information at that stage.  I think the
> approach of treating join removal as a kind of join implementation is
> not unreasonable.  I think though that we might have to add an actual
> "dummy join" path type.  The crocks you put into add_path are, well,
> crocks.

Well, they're pretty simple crocks, but creating a dummy join type
might not be too bad. I'm thinking that it might make sense to have
the dummy join type exist only in the Path world, and to have the
create_plan() machinery strip it out when the actual Plan is built?

>> But as it is the fact that the join
>> can be removed will have to be rediscovered over and over again as
>> planning progresses.
>
> Not really.  We only consider a given join once.

Well, sorta. A lot of the queries where this will apply are probably
of the form:

A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT JOIN G

...where many or all of the left joins are commutable. If the join to
B is removable, then you'll discover that it's removable when you try
to join {A} and {B}, but also when you try to join {A C} to {B}, when
you try to join {A D} to {B}, when you try to join {A C D} to {B},
etc.

In fact, I think that once you've found even one path where a join is
removable, you know it's removable no matter what, so you'd ideally
like to stop caring about the best path for {A B C D E F G} and just
look for the best path for {A C D E F G}. I'm not sure how to make
that work, though.

>> As for going back to "the raw index info", that was kind of my
>> instinct as well but I couldn't make it work.
>
> You need to work harder --- the way it's being done here is way too
> simplistic.  It's failing to take any account of whether the index's
> opclass has anything to do with the semantics of the join operators.
> Even aside from that, I agree with Greg that depending on
> the IndexPath to be there is a fatal mistake.  Do we want
> enable_indexscan = off to disable join removal?  Even if we thought
> that was okay, it seems entirely likely that the IndexPath could be
> discarded on cost grounds before we get to the stage of considering
> joins.

Good point.

> And it certainly won't scale up to considering removal of
> joins above the base level.
>
> I think we want something along the lines of relation_is_distinct_for
> with a list of columns and a list of comparison operators, where the
> first-cut implementation will be to look for matching indexes.
> This will be different from query_is_distinct_for, but it's dealing
> with the same sorts of considerations about whether the operator
> semantics are the right things.

That seems reasonable; my problem is (and I'm sorry if I'm being dense
here) where am I going to get the list of columns and the list of
comparison operators? add_paths_to_joinrel() just gets a list of
RestrictInfos for the join clauses, and I don't know what to do with
that.

Thanks,

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-07-20 04:22:43 Re: join removal
Previous Message Tom Lane 2009-07-20 03:52:53 Re: make check failure for 8.4.0