Re: Allow an alias to be attached directly to a JOIN ... USING

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Georgios Kokolatos <gkokolatos(at)protonmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Allow an alias to be attached directly to a JOIN ... USING
Date: 2021-03-22 23:18:11
Message-ID: 785329.1616455091@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> writes:
> I think Tom's input on the guts of this patch would be most valuable,
> since it intersects a lot with the parse namespace refactoring he did.

I really didn't like the way you'd done that :-(. My primary complaint
is that any one ParseNamespaceItem can describe only one table alias,
but here we have the potential for two aliases associated with the same
join:

select * from (t1 join t2 using(a) as tu) tx;

Admittedly that's not hugely useful since tx hides the tu alias, but
it should behave in a sane fashion. (BTW, after reading the SQL spec
again along the way to reviewing this, I am wondering if hiding the
lower aliases is really what we want; though it may be decades too late
to change that.)

However, ParseNamespaceItem as it stands needs some help for this.
It has a wired-in assumption that p_rte->eref describes the table
and column aliases exposed by the nsitem. 0001 below fixes this by
creating a separate p_names field in an nsitem. (There are some
comments in 0001 referencing JOIN USING aliases, but no actual code
for the feature.) That saves one indirection in common code paths,
so it's possibly a win on its own. Then 0002 is your patch rebased
onto that infrastructure, and with some cleanup of my own.

One thing I ran into is that a whole-row Var for the JOIN USING
alias did the wrong thing. It should have only the common columns,
but we were getting all the join columns in examples such as the
row_to_json() test case I added. This is difficult to fix given
the existing whole-row Var infrastructure, unless we want to make a
separate RTE for the JOIN USING alias, which I think is overkill.
What I did about this was to make transformWholeRowRef produce a
ROW() construct --- which is something that a whole-row Var for a
join would be turned into by the planner anyway. I think this is
semantically OK since the USING construct has already nailed down
the number and types of the join's common columns; there's no
prospect of those changing underneath a stored view query. It's
slightly ugly because the ROW() construct will be visible in a
decompiled view instead of "tu.*" like you wrote originally,
but I'm willing to live with that.

Speaking of decompiled views, I feel like ruleutils.c could do with
a little more work to teach it that these aliases are available.
Right now, it resorts to ugly workarounds:

regression=# create table t1 (a int, b int, c int);
CREATE TABLE
regression=# create table t2 (a int, x int, y int);
CREATE TABLE
regression=# create view vvv as select tj.a, t1.b from t1 full join t2 using(a) as tj, t1 as tx;
CREATE VIEW
regression=# \d+ vvv
View "public.vvv"
Column | Type | Collation | Nullable | Default | Storage | Description
--------+---------+-----------+----------+---------+---------+-------------
a | integer | | | | plain |
b | integer | | | | plain |
View definition:
SELECT a,
t1.b
FROM t1
FULL JOIN t2 USING (a) AS tj,
t1 tx(a_1, b, c);

That's not wrong, but it could likely be done better if ruleutils
realized it could use the tj alias to reference the column, instead
of having to force unqualified "a" to be a globally unique name.

I ran out of steam to look into that, though, and it's probably
something that could be improved later.

One other cosmetic thing is that this:

regression=# select tu.* from (t1 join t2 using(a) as tu) tx;
ERROR: missing FROM-clause entry for table "tu"
LINE 1: select tu.* from (t1 join t2 using(a) as tu) tx;
^

is a relatively dumb error message, compared to

regression=# select t1.* from (t1 join t2 using(a) as tu) tx;
ERROR: invalid reference to FROM-clause entry for table "t1"
LINE 1: select t1.* from (t1 join t2 using(a) as tu) tx;
^
HINT: There is an entry for table "t1", but it cannot be referenced from this part of the query.

I didn't look into why that isn't working, but maybe errorMissingRTE
needs to trawl all of the ParseNamespaceItems not just the RTEs.

Anyway, since these remaining gripes are cosmetic, I'll mark this RFC.

regards, tom lane

Attachment Content-Type Size
v5-0001-add-separate-alias-field.patch text/x-diff 9.7 KB
v5-0002-add-join-using-aliases.patch text/x-diff 27.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2021-03-22 23:18:33 Re: pg_upgrade failing for 200+ million Large Objects
Previous Message Andres Freund 2021-03-22 23:17:37 Re: shared-memory based stats collector