From: | Antonin Houska <antonin(dot)houska(at)gmail(dot)com> |
---|---|
To: | Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Reference to parent query from ANY sublink |
Date: | 2013-12-10 20:37:50 |
Message-ID: | 52A77B9E.1020500@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/06/2013 03:33 PM, Kevin Grittner wrote:
> Antonin Houska <antonin(dot)houska(at)gmail(dot)com> wrote:
>
>> SELECT *
>> FROM tab1 a
>> LEFT JOIN
>> tab2 b
>> ON a.i = ANY (
>> SELECT k
>> FROM tab3 c
>> WHERE k = a.i);
>
> This query works with k in any or all tables, but the semantics
> certainly vary depending on where k happens to be. It would help a
> lot if you showed SQL statements to create and populate the tables
> involved and/or if you qualified all referenced column names with
> the table alias to avoid ambiguity.
I used the DDLs attached (tables.ddl) for this query too, not only for
the queries in quaries.sql. Yes, if I had mentioned it and/or qualified
the 'k' column reference, it wouldn't have broken anything.
> If I assume that the k reference is supposed to be a column in
> tab3, what you have is a query where you always get all rows from
> tab1, and for each row from tab1 you either match it to all rows
> from tab2 or no rows from tab2 depending on whether the tab1 row
> has a match in tab3.
I concede this particular query is not useful. But the important thing
to consider here is which side of the LEFT JOIN the subquery references.
>> SELECT *
>> FROM tab1 a
>> LEFT JOIN
>> (
>> SELECT *
>> tab2 b
>> SEMI JOIN
>> ( SELECT k
>> FROM tab3 c
>> WHERE k = a.i
>> ) AS ANY_subquery
>> ON a.i = ANY_subquery.k
>> ) AS SJ_subquery
>> ON true;
>
> It is hard to see what you intend here, since this is not valid
> syntax.
This is what I - after having read the related source code - imagine to
happen internally when the ANY predicate of the first query is being
processed. In fact it should become something like this (also internal
stuff)
SELECT *
FROM tab1 a
LEFT JOIN
(
tab2 b
SEMI JOIN
( SELECT k
FROM tab3 c
WHERE k = a.i
) AS ANY_subquery
ON a.i = ANY_subquery.k
)
ON true;
that is, SEMI JOIN node inserted into the tree rather than a subquery
(SJ_subquery). I posted the construct with SJ_subquery to show how I
thought about the problem: I thought it's safe (even though not
necessarily beautiful) to wrap the SEMI JOIN into the SJ_subquery and
let the existing infrastructure decide whether it's legal to turn it
into a join node. I concluded that the subquery's references to the tab1
ensure that SJ_subquery won't be flattened, so the patch does nothing if
such a reference exists.
> PostgreSQL supports semi-joins; but that is an implementation detail
> for the EXISTS or IN syntax.
... and for ANY, see subselect.c:convert_ANY_sublink_to_join()
> Could you clarify your intent?
To get rid of a subplan in some cases that require it so far: when the
subquery references table exactly 1 level higher (i.e. the immediate
parent query).
(I got the idea while reading the source code, as opposed to query
tuning.)
// Antonin Houska (Tony)
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2013-12-10 20:39:06 | Re: stats for network traffic WIP |
Previous Message | Heikki Linnakangas | 2013-12-10 20:19:14 | Re: ANALYZE sampling is too good |