Re: join removal

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: join removal
Date: 2009-08-16 21:31:23
Message-ID: 603c8f070908161431u7b07bf30g2c0ddd3eef220ebd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 9, 2009 at 12:19 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> distinct_col_search() is going to return the relevant equality
>> operator from the argument list, which is ultimately going to come
>> from the RestrictInfo for the join clause.  So I need to see whether
>> that's compatible with the index, but equality_ops_are_compatible()
>> wants two equality operators, and what I have is one equality operator
>> and one operator class.
>
> For that you just check if the operator is a member of the class.
> (You might need to verify that it's an equality operator in the class
> too; not clear if the context is enough to be sure that it's not '<'
> for example.)

It seems that the needed checks are very similar to the ones that we
already implement when setting restrictinfo->mergeopfamilies. That is
filled in by get_mergejoin_opfamilies(), which checks for btree
opfamilies where the strategy number is BTEqualStrategyNumber. This
might cease to be the correct check in the (not-too-distant?) future
if we end up implementing other kinds of unique indices, but right now
btrees are all there is.

One possibility would be to have relation_is_distinct_for() call
get_mergejoin_opfamilies() for each operator; then for each index we
can check whether the opfamily of the relevant index column is in the
returned list. This seems a bit wasteful, though, since I believe
that relation_is_distinct_for() would be called from joinpath.c, which
has access to restrictinfo->mergeopfamilies already.

I'm wondering whether it would make more sense to modify the proposed
API for relation_is_distinct_for() in some way so that we don't lose
this information. It seems to me that the overall process here is
something like this (recalling that I'm focusing only on removing LEFT
joins at this point):

1. Given a joinrel, innerrel, and outerrel, find the list of
RestrictInfos for which (a) restrictinfo->mergeopfamilies != NIL, (b)
restrictinfo->outer_is_left is well-defined (as per logic in
select_mergejoin_clauses), and (c) the outer side is a Var. If this
list is NIL, then give up; join removal is not possible.

2. Check whether any attributes from the outer side are used above the
join; if so, then give up; join removal is not possible.

3. Extract the column numbers from the Vars found in step 1(C) and the
mergeopfamilies found in step 1(A).

4. Look a unique, non-expression index (which must also have
index->indpred == NIL or index->predOK) for which every column number
appears in the list of column numbers computed in step 3, with one of
the corresponding opfamilies also found in step (2). If one is found,
then the join is removable.

Thoughts?

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-08-17 00:55:43 Re: Croatian translation
Previous Message James Pye 2009-08-16 20:34:26 Re: [PATCH] plpythonu datatype conversion improvements