Re: Performance improvement for joins where outer side is unique

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2015-03-28 07:35:00
Message-ID: CAApHDvrvPqgjcko3wUcv2+Sp5UxkVU6HYL54nNyuqQkmrCZ-=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 25 March 2015 at 01:11, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> Hi, thanks for the new patch.
>
> I made an additional shrink from your last one. Do you have a
> look on the attached?
>
>
Thanks, for looking again.

I'm not too sure I like the idea of relying on join removals to mark the
is_unique_join property.
By explicitly doing it in mark_unique_joins we have the nice side effect of
not having to re-check a relations unique properties after removing another
relation, providing the relation was already marked as unique on the first
pass.

> At Sun, 22 Mar 2015 19:42:21 +1300, David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote in <
> CAApHDvrKwMmTwkXfn4uazYZA9jQL1c7UwBjBtuwFR69rqLVKfA(at)mail(dot)gmail(dot)com>
> > On 20 March 2015 at 21:11, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > >
> > > I can continue working on your patch if you like? Or are you planning
> to
> > > go further with it?
>
> It's fine that you continue to work on this.
>
> # Sorry for the hardly baked patch which had left many things alone:(
>
> > I've been working on this more over the weekend and I've re-factored
> things
> > to allow LEFT JOINs to be properly marked as unique.
> > I've also made changes to re-add support for detecting the uniqueness of
> > sub-queries.
>
> I don't see the point of calling mark_unique_joins for every
> iteration on join_info_list in remove_useless_joins.
>
>
I've fixed this in the attached. I must have forgotten to put the test for
LEFT JOINs here as I was still thinking that I might make a change to the
code that converts unique semi joins to inner joins so that it just checks
is_unique_join instead of calling relation_has_unique_index_for().

> The loop already iteraltes on whole the join_info_list so
> mark_unique_join as an individual function is needless.
>
> Finally, simply marking uniqueness of join in join_is_removable
> seems to be enough, inhibiting early bailing out by the checks on
> attribute usage and placeholder let it work as expected.
>
> Reducing changes to this extent, I can easily see what is added
> to planning computations. It consists of mainly two factors.
>
> - Unique-join chekcs for every candidate inner joins in
> add_paths_to_joinrel.
>
> - Uniqueness check of mergejoinable clause in join-removability
> check for every left join, some of which would be skipped
> by other checks before.
>
>
> > Also, I've added modified the costing for hash and nested loop joins to
> > reduce the cost for unique inner joins to cost the join the same as it
> does
> > for SEMI joins. This has tipped the scales on a few plans in the
> regression
> > tests.
>
> I've forgotten it, but quite important.
>
>
I've fixed quite a fundamental bug in my previous costing change. The fix
for this makes it so I have to pass the unique_inner bool down to the
costing functions. This also changes how the JoinPaths are marked as
unique_inner. This is now done when the JoinPath is created, instead of
updating it afterwards like it was done previously.

> Also, please see attached unijoin_analysis.patch. This just adds some code
> > which spouts out notices when join nodes are initialised which states if
> > the join is unique or not. Running the regression tests with this patch
> in
> > places gives:
> >
> > Unique Inner: Yes == 753 hits
> > Unique Inner: No == 1430 hits
> >
> > So it seems we can increase the speed of about 1 third of joins by about
> > 10%.
> > A quick scan of the "No"s seems to show quite a few cases which do not
> look
> > that real world like. e.g cartesian join.
>
> I don't have an idea how many queries in the reality hit this but
> I suppose it's not a few.
>
> > It would be great if someone could run some PostgreSQL application with
> > these 2 patches applied, and then grep the logs for the Unique Inner
> > results... Just to get a better idea of how many joins in a real world
> case
> > will benefit from this patch.
>
> Wow. I think the second patch should be DEBUGx, not NOTICE:)
>
>
I didn't give that much thought, but you're probably right. It was just a
quick test and demo to try to raise some more interest in this. :-)

Regards

David Rowley

Attachment Content-Type Size
unijoin_2015-03-28_d1923fb.patch application/octet-stream 61.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2015-03-28 08:58:35 Re: Rounding to even for numeric data type
Previous Message Andrew Gierth 2015-03-28 05:16:17 Re: Rounding to even for numeric data type