Re: Performance improvement for joins where outer side is unique

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: dgrowleyml(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2015-03-13 07:34:20
Message-ID: 20150313.163420.247574547.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hello. The performance drag was not so large after all.

For all that, I agree that the opition that this kind of separate
multiple-nested loops on relations, joins or ECs and so on for
searching something should be avoided. I personally feel that
additional time to such an extent (around 1%) would be tolerable
if it affected a wide range of queries or it brought more obvious
gain.

Unfortunately I can't decide this to be 'ready for commiter' for
now. I think we should have this on smaller footprint, in a
method without separate exhauxtive searching. I also had very
similar problem in the past but I haven't find such a way for my
problem..

At Wed, 11 Mar 2015 01:32:24 +1300, David Rowley <dgrowleyml(at)gmail(dot)com> wrote in <CAApHDvpEXAjs6mV2ro4=3qbzpx=pLrteinX0J2YHq6wrp85pPw(at)mail(dot)gmail(dot)com>
> > explain select t1.a, t10.b from t1 join t2 on (t1.b = t2.a) join t3 on
> > (t2.b = t3.a) join t4 on (t3.b = t4.a) join t5 on (t4.b = t5.a) join t6 on
> > (t5.b = t6.a) join t7 on (t6.b = t7.a) join t8 on (t7.b = t8.a) join t9 on
> > (t8.b = t9.a) join t10 on (t9.b = t10.a);
> >
> > The head takes 3ms for planning and the patched version takes
> > around 5ms while pure execution time is 1ms.. I think it is a too
> > long extra time.
> >
> >
> This smells quite fishy to me. I'd be quite surprised if your machine took
> an extra 2 ms to do this.

You're right. Sorry. I was amazed by the numbers..

I took again the times for both master and patched on master
(some conflict arised). Configured with no options so compiled
with -O2 and no assertions. Measured the planning time for the
test query 10 times and calcualted the average.

patched: 1.883ms (stddev 0.034)
master: 1.861ms (stddev 0.042)

About 0.02ms, 1% extra time looks to be taken by the extra
processing.

regards,

> I've run what I think is the same test on my 5 year old i5 laptop and
> attached the .sql file which I used to generate the same schema as you've
> described.
>
> I've also attached the results of the explain analyze "Planning Time:"
> output from patched and unpatched using your test case.
>
> I was unable to notice any difference in plan times between both versions.
> In fact master came out slower, which is likely just the noise in the
> results.
>
> Just to see how long mark_unique_joins() takes with your test case I
> changed query_planner() to call mark_unique_joins() 1 million times:
>
> {
> int x;
> for (x = 0; x < 1000000;x++)
> mark_unique_joins(root, joinlist);
> }
>
>
> I also got rid of the fast path test which bails out if the join is already
> marked as unique.
>
> /* check if we've already marked this join as unique on a previous call */
> /*if (idxrel->is_unique_join)
> return true;
> */
>
> On my machine after making these changes, it takes 800 ms to plan the
> query. So it seems that's around 800 nano seconds for a single call to
> mark_unique_joins().
>
> Perhaps you've accidentally compiled the patched version with debug asserts?
>
> Are you able to retest this?

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2015-03-13 08:10:52 Re: Proposal : REINDEX xxx VERBOSE
Previous Message Amit Langote 2015-03-13 04:52:36 Re: Parallel Seq Scan