Re: BUG #16212: subquery block allows to overwrite table alias defined earlier

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: luza(dot)mbox(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16212: subquery block allows to overwrite table alias defined earlier
Date: 2020-01-17 00:11:13
Message-ID: 17596.1579219873@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> Short example to illustrate the issue:

> SELECT
> a,
> b.p,
> c.p
> FROM
> (VALUES (1)) a
> JOIN LATERAL (
> SELECT p FROM (VALUES (2)) p
> ) b ON TRUE
> JOIN LATERAL (
> SELECT p FROM (VALUES (3)) p
> ) c ON TRUE
> ;

Hm. What's happening here is that when transformColumnRef() tries
to resolve the last "p", it first looks it up as a column reference,
and only if that fails does it try to find a whole-row match.
In this example, the column-reference interpretation succeeds
(finding b.p), and it just runs with that instead of noticing
that a whole-row match is also possible.

I'm loath to change that, really. In the first place, referencing
a whole-row var without using ".*" is a semi-deprecated feature.
In the second place, to handle this we'd basically need to double
the lookup effort for every unqualified column reference, just
on the off chance that we could find a closer interpretation of
the name as a whole-row reference. And in the third place, it
seems likely that we'd break more existing queries than we'd fix.

The most reasonable way to avoid this problem is just not to use
conflicting table aliases in the first place; you're much more likely
to confuse yourself than the machine with such a coding style.

If you really want to do it just as above, the trick is to write
"p.*" without having SELECT interpret that as something to expand
into separate column references. I think you have to do that
like this:

... JOIN LATERAL (
SELECT (p.*)::record AS p FROM (VALUES (3)) p
) c ON TRUE

> Another example:
> SELECT
> a,
> b.a
> FROM
> (VALUES (1)) a
> JOIN (
> SELECT a FROM (VALUES (2)) a
> ) b ON TRUE
> ;
> Expected result ((1), (2))
> Actual result ((2), (2)).

This is just a faulty expectation. If you try that as

SELECT
*
FROM
(VALUES (1)) a
JOIN (
SELECT a FROM (VALUES (2)) a
) b ON TRUE
;

you'll see that what the JOIN is producing is

column1 | a
---------+-----
1 | (2)
(1 row)

and so either "a" or "b.a" is going to reference the same thing.
Again, if you were to write "a.*" in the outer SELECT list,
you'd get a different result:

SELECT
a.*
FROM
(VALUES (1)) a
JOIN (
SELECT a FROM (VALUES (2)) a
) b ON TRUE
;
column1
---------
1
(1 row)

Even if I thought that was a bug, there's no way we'd change it
now; there are way too many queries it would break.

Another point applying to both these examples is that since whole-row
vars aren't in the SQL standard, resolving a name as a whole-row
reference when a column reference is also possible would almost
certainly result in failing to interpret some SQL-compliant queries
per spec.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2020-01-17 00:12:16 Re: BUG #16208: background worker "logical replication worker" was terminated by signal 11: Segmentation
Previous Message Michael Paquier 2020-01-16 23:45:14 Re: BUG #16205: background worker "logical replication worker" (PID 25218) was terminated by signal 11: Segmentation