Re: Reference to parent query from ANY sublink

From: Antonin Houska <antonin(dot)houska(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reference to parent query from ANY sublink
Date: 2013-12-10 20:37:50
Message-ID: 52A77B9E.1020500@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/06/2013 03:33 PM, Kevin Grittner wrote:
> Antonin Houska <antonin(dot)houska(at)gmail(dot)com> wrote:
>
>> SELECT *
>> FROM tab1 a
>> LEFT JOIN
>> tab2 b
>> ON a.i = ANY (
>> SELECT k
>> FROM tab3 c
>> WHERE k = a.i);
>
> This query works with k in any or all tables, but the semantics
> certainly vary depending on where k happens to be. It would help a
> lot if you showed SQL statements to create and populate the tables
> involved and/or if you qualified all referenced column names with
> the table alias to avoid ambiguity.

I used the DDLs attached (tables.ddl) for this query too, not only for
the queries in quaries.sql. Yes, if I had mentioned it and/or qualified
the 'k' column reference, it wouldn't have broken anything.

> If I assume that the k reference is supposed to be a column in
> tab3, what you have is a query where you always get all rows from
> tab1, and for each row from tab1 you either match it to all rows
> from tab2 or no rows from tab2 depending on whether the tab1 row
> has a match in tab3.

I concede this particular query is not useful. But the important thing
to consider here is which side of the LEFT JOIN the subquery references.

>> SELECT *
>> FROM tab1 a
>> LEFT JOIN
>> (
>> SELECT *
>> tab2 b
>> SEMI JOIN
>> ( SELECT k
>> FROM tab3 c
>> WHERE k = a.i
>> ) AS ANY_subquery
>> ON a.i = ANY_subquery.k
>> ) AS SJ_subquery
>> ON true;
>
> It is hard to see what you intend here, since this is not valid
> syntax.

This is what I - after having read the related source code - imagine to
happen internally when the ANY predicate of the first query is being
processed. In fact it should become something like this (also internal
stuff)

SELECT *
FROM tab1 a
LEFT JOIN
(
tab2 b
SEMI JOIN
( SELECT k
FROM tab3 c
WHERE k = a.i
) AS ANY_subquery
ON a.i = ANY_subquery.k
)
ON true;

that is, SEMI JOIN node inserted into the tree rather than a subquery
(SJ_subquery). I posted the construct with SJ_subquery to show how I
thought about the problem: I thought it's safe (even though not
necessarily beautiful) to wrap the SEMI JOIN into the SJ_subquery and
let the existing infrastructure decide whether it's legal to turn it
into a join node. I concluded that the subquery's references to the tab1
ensure that SJ_subquery won't be flattened, so the patch does nothing if
such a reference exists.

> PostgreSQL supports semi-joins; but that is an implementation detail
> for the EXISTS or IN syntax.

... and for ANY, see subselect.c:convert_ANY_sublink_to_join()

> Could you clarify your intent?

To get rid of a subplan in some cases that require it so far: when the
subquery references table exactly 1 level higher (i.e. the immediate
parent query).

(I got the idea while reading the source code, as opposed to query
tuning.)

// Antonin Houska (Tony)

> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-12-10 20:39:06 Re: stats for network traffic WIP
Previous Message Heikki Linnakangas 2013-12-10 20:19:14 Re: ANALYZE sampling is too good