Re: Performance improvement for joins where outer side is unique

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2015-03-10 12:32:24
Message-ID: CAApHDvpEXAjs6mV2ro4=3qbzpx=pLrteinX0J2YHq6wrp85pPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 10 March 2015 at 19:19, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

>
> From the another point of view, the patch looks a bit large for
> the gain (for me). Addition to it, it loops by many levels.
>
> [mark_unique_joins()]
> foreach (joinlist)
> [eclassjoin_is_unique_join()]
> foreach(joinlist)
> foreach(root->eq_classes)
> foreach(root->ec_members)
>
> The innermost loop could be roughly said to iterate about 10^3*2
> times for a join of 10 tables all of which have index and no
> eclass is shared among them. From my expreince, we must have the
> same effect by far less iteration levels.
>
> This is caueed the query like this, as a bad case.
>
> create table t1 (a int, b int);
> create index i_t1 (a int);
> create table t2 (like t1 including indexes);
> ....
> create table t10 (.....);
> insert into t1 (select a, a * 10 from generate_series(0, 100) a);
> insert into t2 (select a * 10, a * 20 from generate_series(0, 100) a);
> ...
> insert into t10 (select a * 90, a * 100 from generate_series(0, 100) a);
>
> explain select t1.a, t10.b from t1 join t2 on (t1.b = t2.a) join t3 on
> (t2.b = t3.a) join t4 on (t3.b = t4.a) join t5 on (t4.b = t5.a) join t6 on
> (t5.b = t6.a) join t7 on (t6.b = t7.a) join t8 on (t7.b = t8.a) join t9 on
> (t8.b = t9.a) join t10 on (t9.b = t10.a);
>
> The head takes 3ms for planning and the patched version takes
> around 5ms while pure execution time is 1ms.. I think it is a too
> long extra time.
>
>
This smells quite fishy to me. I'd be quite surprised if your machine took
an extra 2 ms to do this.

I've run what I think is the same test on my 5 year old i5 laptop and
attached the .sql file which I used to generate the same schema as you've
described.

I've also attached the results of the explain analyze "Planning Time:"
output from patched and unpatched using your test case.

I was unable to notice any difference in plan times between both versions.
In fact master came out slower, which is likely just the noise in the
results.

Just to see how long mark_unique_joins() takes with your test case I
changed query_planner() to call mark_unique_joins() 1 million times:

{
int x;
for (x = 0; x < 1000000;x++)
mark_unique_joins(root, joinlist);
}

I also got rid of the fast path test which bails out if the join is already
marked as unique.

/* check if we've already marked this join as unique on a previous call */
/*if (idxrel->is_unique_join)
return true;
*/

On my machine after making these changes, it takes 800 ms to plan the
query. So it seems that's around 800 nano seconds for a single call to
mark_unique_joins().

Perhaps you've accidentally compiled the patched version with debug asserts?

Are you able to retest this?

Regards

David Rowley

Attachment Content-Type Size
unijoin_plan_benchmark.sql text/plain 1.4 KB
unijoin_plan_benchmark_results.xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 8.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-03-10 12:52:54 Re: CATUPDATE confusion?
Previous Message Alvaro Herrera 2015-03-10 12:27:40 Re: patch : Allow toast tables to be moved to a different tablespace