Re: Performance improvement for joins where outer side is unique

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2015-03-14 10:05:24
Message-ID: CAApHDvoh-EKF51QQyNoJUe0eHYMZw6OzJjjgYP63Cmw7QfebjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 14 March 2015 at 14:51, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On 13 March 2015 at 20:34, Kyotaro HORIGUCHI <
> horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
>> Unfortunately I can't decide this to be 'ready for commiter' for
>>
> now. I think we should have this on smaller footprint, in a
>> method without separate exhauxtive searching. I also had very
>> similar problem in the past but I haven't find such a way for my
>> problem..
>>
>>
> I don't think it's ready yet either. I've just been going over a few
> things and looking at Tom's recent commit b557226 in pathnode.c I've got a
> feeling that this patch would need to re-factor some code that's been
> modified around the usage of relation_has_unique_index_for() as when this
> code is called, the semi joins have already been analysed to see if they're
> unique, so it could be just a case of ripping all of that out
> of create_unique_path() and just putting a check to say rel->is_unique_join
> in there. But if I do that then I'm not quite sure if
> SpecialJoinInfo->semi_rhs_exprs and SpecialJoinInfo->semi_operators would
> still be needed at all. These were only added in b557226. Changing this
> would help reduce the extra planning time when the query contains
> semi-joins. To be quite honest, this type of analysis belongs in
> analyzejoin.c anyway. I'm tempted to hack at this part some more, but I'd
> rather Tom had a quick glance at what I'm trying to do here first.
>
>

I decided to hack away any change the code Tom added in b557226. I've
changed it so that create_unique_path() now simply just uses if
(rel->is_unique_join), instead off all the calls to
relation_has_unique_index_for() and query_is_distinct_for(). This vastly
simplifies that code. One small change is that Tom's checks for uniqueness
on semi joins included checks for volatile functions, this check didn't
exist in the original join removal code, so I've left it out. We'll never
match a expression with a volatile function to a unique index as indexes
don't allow volatile function expressions anyway. So as I understand it
this only serves as a fast path out if the join condition has a volatile
function... But I'd assume that check is not all that cheap.

I ended up making query_supports_distinctness() and query_is_distinct_for()
static in analyzejoins.c as they're not used in any other files.
relation_has_unique_index_for() is also now only used in analyzejoins.c,
but I've not moved it into that file yet as I don't want to bloat the
patch. I just added a comment to say it needs moved.

I've also added a small amount of code to query_is_distinct_for() which
allows subqueries such as (select 1 a offset 0) to be marked as unique. I
thought it was a little silly that these were not being detected as unique,
so I fixed it. This has the side effect of allowing left join removals for
queries such as: select t1.* from t1 left join (select 1 a offset 0) a on
t1.id=a.a;

Updated patch attached.

Regards

David Rowley

Attachment Content-Type Size
unijoin_2015-03-14_81bd96a.patch application/octet-stream 67.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-03-14 12:31:51 Re: pg_rewind in contrib
Previous Message David Rowley 2015-03-14 07:55:44 Re: Performance improvement for joins where outer side is unique