From: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Performance improvement for joins where outer side is unique |
Date: | 2016-03-12 19:37:03 |
Message-ID: | CAKFQuwb9wfArZxQzrUe3JJA+snmucj3Q6nY-VMv7yD4f7O4gdA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Saturday, March 12, 2016, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com <javascript:;>> writes:
> > Don't the semantics of a SEMI JOIN also state that the output columns
> only
> > come from the outer relation? i.e., the inner relation doesn't contribute
> > either rows or columns to the final result? Or is that simply
> > an implementation artifact of the fact that the only current way to
> perform
> > a semi-join explicitly is via exists/in?
>
> I think it's an artifact. What nodes.h actually says about it is you get
> the values of one randomly-selected matching inner row, which seems like
> a fine definition for the purposes we plan to put it to.
>
>
But is it a definition that actually materializes anywhere presently?
I'm not sure what we consider an authoritative source but relational
algebra does define the results of semi and anti joins as only containing
rows from main relation.
https://en.m.wikipedia.org/wiki/Relational_algebra
Given that this is largely internals (aside from the plan explanations
themselves) I guess we can punt for now but calling an inner or outer join
a semijoin in this case relies on a non-standard definition of semijoin -
namely that it is an optimized variation of the other joins instead of a
join type in its own right. This is complicated further in that we
do implement a true semijoin (using exists) while we allow for an anti join
to be non-standard if expressed using "left join ... is null" instead of
via "not exists".
Calling these optimizations outer/inner+semi/anti preserves the ability to
distinguish these versions from the standard definitions. I do like ithe
idea of it being exposed and encapsulated as a distinct join type instead
of being an attribute.
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2016-03-12 19:46:25 | Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat |
Previous Message | Tom Lane | 2016-03-12 18:53:06 | Re: pl/pgSQL, get diagnostics and big data |