Re: Table name scope (was Re: Outer joins aren't working with views)

From: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Table name scope (was Re: Outer joins aren't working with views)
Date: 2000-12-17 06:32:09
Message-ID: 3A3C5DE9.5C46E02F@alumni.caltech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

> I disagree on that. The table's real/alias name is certainly supposed
> to be accessible, and I see nothing in the spec that says that only some
> of its columns are accessible via qualification.

Date and Darwen disagree circa 1997, and I believe that SQL99 does not
radically alter the spec in this regard. All of my interpretations below
are based on D&D, not the draft spec we have available (though I look to
that to support their interpretation, which imho it does).

> What the spec does say
> is that the *output* of the join has only one copy of the joined column.
> In other words, given table A with columns ID and CA, and table B with
> columns ID and CB, I believe the correct behavior is
>
> SELECT * FROM (A NATURAL JOIN B) J produces ID, CA, CB

Yes.

> SELECT J.* FROM (A NATURAL JOIN B) J produces ID, CA, CB

Yes.

> SELECT A.* FROM (A NATURAL JOIN B) J produces ID, CA

No, since there is a range variable J, no columns explicitly qualified
with A or B are visible. If the range variable J is omitted, then the
result will produce only CA. See one of the D&D cases I include below.

> SELECT B.* FROM (A NATURAL JOIN B) J produces ID, CB

Same as for the previous case. B.* is not visible since a range variable
is specified, and if J is not there then B.* produces CB only.

> If it's an outer join then J.ID is subtly different from A.ID and/or
> B.ID --- the spec defines the output column as COALESCE(A.ID,B.ID)
> (cf SQL92 7.5 <joined table>, syntax rule 6.d) to get rid of introduced
> nulls. BTW, our implementation simplifies that to A.ID for an inner or
> left join, or B.ID for a right join, and only uses the full COALESCE
> expression for a full join.

Right, the result is the same for these cases. The only issue is the
scoping on the name allowed for external reference.

> Anyway, I believe it's true that you can't get at A.ID or B.ID in
> this example except by qualifying the column name with the table name
> --- but I don't see where it says that you shouldn't be able to get
> at them at all. If that were true then the definition in 7.5.6.d
> wouldn't be legal, because that's exactly the syntax it uses to define
> the joined column.

7.7.7.d seems to define SLCC pretty clearly, without a table name
qualification. I think that this is consistant with D&D's
interpretation.

> > Date and Darwen, 4th ed, pp 142-144 discuss various aspects of join
> > scope and behavior. For NATURAL JOIN, the columns with common names
> > forming the join columns *lose* their underlying table name, since they
> > can't be traced back to a column from a specific table (the table of
> > origin is ambiguous).
> My reading is that the output columns are qualified with the JOIN
> clause's correlation name, if any (J in my example). If you didn't
> bother to stick a correlation name on the join clause, you couldn't
> refer to them with a qualified name.

Sure. But without a correlation name, you are not allowed to qualify
with the underlying table name for "join columns" from NATURAL or JOIN
ON joins. See below...

> In an example like
>
> SELECT * FROM (A NATURAL LEFT JOIN (B NATURAL FULL JOIN C));
>
> supposing that all three tables have a column ID, then the output ID
> column of the B/C join has no qualified name, and it would indeed be
> impossible to refer to it from the SELECT list. The only IDs accessible
> from the SELECT list are the also-qualified-name-less output of the
> left join and A.ID, B.ID, C.ID, none of which are quite the same as
> the output of the full join. Perhaps what Date and Darwen are talking
> about is cases like this?

No, they are talking about simpler cases, and very clearly they disagree
with the current behavior of the PostgreSQL parser. Now, it may be that
SQL99 has changed the scoping rules for these cases, but instead I would
look for support for Date and Darwen's interpretation in the spec,
rather than reading the spec from first principles. Date and Darwen can
explain it in a couple of pages, and give examples, where the spec is
just way too convoluted for a clear reading istm.

Anyway, here are two cases discussed by D&D -- note that table sp has
columns (sno, pno, qty) and table s has columns (sno, sname, status,
city, primary):

(p142, after a discussion of other cases)
"One very counterintuitive consequence of this unorthodox scoping rule
is illustrated by the following example: The result of the expression

select distinct sp.* from sp natural join s;

will include columns PNO and QTY but *not* column SNO, because --
believe it or not -- there is no column "SP.SNO" in the result of the
join expression (indeed specifying SP.SNO in the SELECT clause would be
a syntax error)."

The emphasis is D&D's, not mine ;) For natural joins, or other joins
where two columns are subsumed into one (anything with a USING clause?)
the scoping rules are clear, at least to D&D: it is not possible to
reference one of these columns by qualifying with the name of an
underlying table.

Another case (p143-144, following some simpler cases which show how
scoping progresses through more deeply nested joins):
"Now let us modify the example once again to introduce an explicit range
variable TC for the overall result:

( ( T1 JOIN T2 ON cond-1 ) AS TA
JOIN
( T3 JOIN T4 ON cond-2 ) AS TB
ON cond-3 ) AS TC

The rules are now as follows:

cond-1 can reference T1 and T2 but not T3, T4, TA, TB, or TC
cond-2 can reference T3 and T4 but not T1, T2, TA, TB, or TC
cond-3 can reference TA and TB but not T1, T2, T3, T4, or TC

and (once again) if the overall expression appears as the operand of a
FROM clause, then the associated SELECT clause, WHERE clause, etc. can
reference TC but not T1, T2, T3, T4, TA, or TB."

So the two D&D cases cited above illustrate the "with range variables"
and "without range variables" expected behavior. Comments?

- Thomas

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2000-12-17 06:44:50 Re: Table name scope (was Re: Outer joins aren't working with views)
Previous Message Tom Lane 2000-12-16 17:46:47 Re: Table name scope (was Re: Outer joins aren't working with views)

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-12-17 06:44:50 Re: Table name scope (was Re: Outer joins aren't working with views)
Previous Message Tom Lane 2000-12-17 02:46:50 Re: [HACKERS] 7.1 features list