Re: Performance improvement for joins where outer side is unique

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2016-10-31 05:37:09
Message-ID: CAKJS1f96jZYcv69rN-N_d+SpFqB4-4yBt0p_+fyPY=ESmj7Jzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8 April 2016 at 06:49, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> Just had a thought about this, which should have crystallized a long
> time ago perhaps. Where I'd originally imagined you were going with
> this idea is to do what the thread title actually says, and check for
> joins in which the *outer* side is unique. I can't see that that's
> of any value for nestloop or hash joins, but for merge joins, knowing
> that the outer side is unique could be extremely valuable because
> we could skip doing mark/restore backups on the inner side, hugely
> reducing the cost when the inner side has many duplicates. Now I'm
> not asking for the initially-committed version of the patch to do that,
> but I think we need to design it to be readily extensible to do so.

I've rebased the changes I made to address this back in April to current master.

The changes get rid of the changing JOIN_INNER to JOIN_SEMI conversion
and revert back to the original "inner_unique" marking of the joins.
In addition to this I've also added "outer_unique", which is only made
use of in merge join to control if the outer side needs to enable mark
and restore or not.

However, having said that, I'm not sure why we'd need outer_unique
available so we'd know that we could skip mark/restore. I think
inner_unique is enough for this purpose. Take the comment from
nodeMergejoin.c:

* outer: (0 ^1 1 2 5 5 5 6 6 7) current tuple: 1
* inner: (1 ^3 5 5 5 5 6) current tuple: 3
...
*
* Consider the above relations and suppose that the executor has
* just joined the first outer "5" with the last inner "5". The
* next step is of course to join the second outer "5" with all
* the inner "5's". This requires repositioning the inner "cursor"
* to point at the first inner "5". This is done by "marking" the
* first inner 5 so we can restore the "cursor" to it before joining
* with the second outer 5. The access method interface provides
* routines to mark and restore to a tuple.

If only one inner "5" can exist (unique_inner), then isn't the inner
side already in the correct place, and no restore is required?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
unique_joins_2016-10-31.patch application/octet-stream 83.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-10-31 05:58:16 Re: add more NLS to bin
Previous Message Michael Paquier 2016-10-31 05:31:34 Re: commit fest manager for CF 2016-11?