Re: Performance improvement for joins where outer side is unique

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2016-04-07 18:49:38
Message-ID: 11488.1460054978@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> [ unique_joins_2016-04-07.patch ]

Just had a thought about this, which should have crystallized a long
time ago perhaps. Where I'd originally imagined you were going with
this idea is to do what the thread title actually says, and check for
joins in which the *outer* side is unique. I can't see that that's
of any value for nestloop or hash joins, but for merge joins, knowing
that the outer side is unique could be extremely valuable because
we could skip doing mark/restore backups on the inner side, hugely
reducing the cost when the inner side has many duplicates. Now I'm
not asking for the initially-committed version of the patch to do that,
but I think we need to design it to be readily extensible to do so.

The problem with this is that it blows the current factorization around
add_paths_to_joinrel out of the water. What we'd want is for the caller
(make_join_rel) to determine uniqueness on both sides, and pass that info
down to each of its two calls of add_paths_to_joinrel; otherwise we have
to do double the work because each run of add_paths_to_joinrel will have
to make those same two determinations.

This probably also means that encoding the uniqueness into JoinType is
a lost cause. Expanding JOIN_INNER into four variants depending on
whether either or both sides are known unique, and ditto for JOIN_LEFT,
doesn't seem attractive at all. I suspect we want to go back to your
original design with a separate bool flag (well, two bools now, but
anyway separate from JoinType). Or maybe the variant JoinTypes still
are better, since they'd fit into switch() tests more naturally, but
it's a lot more debatable as to whether that notation is a win.

I apologize for leading you down the wrong path on the notational aspect,
but sometimes the right answer isn't clear till you've tried all the
possibilities.

Anyway, while refactoring the make_join_rel/add_paths_to_joinrel division
of labor wouldn't be such a big deal in itself, I don't want to commit a
change to JoinType only to undo it later; that would be too much churn.
So I think we need to resolve this question before we can move forward.
I don't know if you have time to look at this now --- my clock says it's
already Friday morning in New Zealand.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-04-07 19:02:14 Re: Performance improvement for joins where outer side is unique
Previous Message Petr Jelinek 2016-04-07 18:33:16 Re: Timeline following for logical slots