Re: Performance improvement for joins where outer side is unique

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: dgrowleyml(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2015-03-16 08:28:55
Message-ID: 20150316.172855.213501931.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, I don't have enough time for now but made some
considerations on this.

It might be a way that marking unique join peer at bottom and
propagate up, not searching from top of join list.
Around create_join_clause might be a candidate for it.
I'll investigate that later.

regards,

At Sat, 14 Mar 2015 23:05:24 +1300, David Rowley <dgrowleyml(at)gmail(dot)com> wrote in <CAApHDvoh-EKF51QQyNoJUe0eHYMZw6OzJjjgYP63Cmw7QfebjA(at)mail(dot)gmail(dot)com>
dgrowleyml> On 14 March 2015 at 14:51, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
dgrowleyml>
dgrowleyml> > On 13 March 2015 at 20:34, Kyotaro HORIGUCHI <
dgrowleyml> > horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
dgrowleyml> >
dgrowleyml> >> Unfortunately I can't decide this to be 'ready for commiter' for
dgrowleyml> >>
dgrowleyml> > now. I think we should have this on smaller footprint, in a
dgrowleyml> >> method without separate exhauxtive searching. I also had very
dgrowleyml> >> similar problem in the past but I haven't find such a way for my
dgrowleyml> >> problem..
dgrowleyml> >>
dgrowleyml> >>
dgrowleyml> > I don't think it's ready yet either. I've just been going over a few
dgrowleyml> > things and looking at Tom's recent commit b557226 in pathnode.c I've got a
dgrowleyml> > feeling that this patch would need to re-factor some code that's been
dgrowleyml> > modified around the usage of relation_has_unique_index_for() as when this
dgrowleyml> > code is called, the semi joins have already been analysed to see if they're
dgrowleyml> > unique, so it could be just a case of ripping all of that out
dgrowleyml> > of create_unique_path() and just putting a check to say rel->is_unique_join
dgrowleyml> > in there. But if I do that then I'm not quite sure if
dgrowleyml> > SpecialJoinInfo->semi_rhs_exprs and SpecialJoinInfo->semi_operators would
dgrowleyml> > still be needed at all. These were only added in b557226. Changing this
dgrowleyml> > would help reduce the extra planning time when the query contains
dgrowleyml> > semi-joins. To be quite honest, this type of analysis belongs in
dgrowleyml> > analyzejoin.c anyway. I'm tempted to hack at this part some more, but I'd
dgrowleyml> > rather Tom had a quick glance at what I'm trying to do here first.
dgrowleyml> >
dgrowleyml> >
dgrowleyml>
dgrowleyml> I decided to hack away any change the code Tom added in b557226. I've
dgrowleyml> changed it so that create_unique_path() now simply just uses if
dgrowleyml> (rel->is_unique_join), instead off all the calls to
dgrowleyml> relation_has_unique_index_for() and query_is_distinct_for(). This vastly
dgrowleyml> simplifies that code. One small change is that Tom's checks for uniqueness
dgrowleyml> on semi joins included checks for volatile functions, this check didn't
dgrowleyml> exist in the original join removal code, so I've left it out. We'll never
dgrowleyml> match a expression with a volatile function to a unique index as indexes
dgrowleyml> don't allow volatile function expressions anyway. So as I understand it
dgrowleyml> this only serves as a fast path out if the join condition has a volatile
dgrowleyml> function... But I'd assume that check is not all that cheap.
dgrowleyml>
dgrowleyml> I ended up making query_supports_distinctness() and query_is_distinct_for()
dgrowleyml> static in analyzejoins.c as they're not used in any other files.
dgrowleyml> relation_has_unique_index_for() is also now only used in analyzejoins.c,
dgrowleyml> but I've not moved it into that file yet as I don't want to bloat the
dgrowleyml> patch. I just added a comment to say it needs moved.
dgrowleyml>
dgrowleyml> I've also added a small amount of code to query_is_distinct_for() which
dgrowleyml> allows subqueries such as (select 1 a offset 0) to be marked as unique. I
dgrowleyml> thought it was a little silly that these were not being detected as unique,
dgrowleyml> so I fixed it. This has the side effect of allowing left join removals for
dgrowleyml> queries such as: select t1.* from t1 left join (select 1 a offset 0) a on
dgrowleyml> t1.id=a.a;
dgrowleyml>
dgrowleyml> Updated patch attached.
dgrowleyml>
dgrowleyml> Regards
dgrowleyml>
dgrowleyml> David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Voronin 2015-03-16 08:31:29 Question about TEMP tables
Previous Message Michael Paquier 2015-03-16 08:17:23 Re: Re: Install shared libs in lib/ and bin/ with MSVC (Was: install libpq.dll in bin directory on Windows / Cygwin)