Hi,
After running regression, I ran EXPLAIN on one of the queries in regression
(test create_misc) and got following output
regression=# explain verbose select * into table ramp from road where name
~ '.*Ramp';
QUERY
PLAN
------------------------------------------------------------------------------------
Result (cost=0.00..154.00 rows=841 width=67)
Output: public.road.name, public.road.thepath
-> Append (cost=0.00..154.00 rows=841 width=67)
-> Seq Scan on public.road (cost=0.00..135.05 rows=418 width=67)
Output: public.road.name, public.road.thepath
Filter: (public.road.name ~ '.*Ramp'::text)
-> Seq Scan on public.ihighway road (cost=0.00..14.99 rows=367
width=67)
^^^^^
Output: public.road.name, public.road.thepath
^^^^^^^^^^, ^^^^^^
Filter: (public.road.name ~ '.*Ramp'::text)
^^^^^^^^^^^
-> Seq Scan on public.shighway road (cost=0.00..3.96 rows=56
width=67)
Output: public.road.name, public.road.thepath
Filter: (public.road.name ~ '.*Ramp'::text)
(12 rows)
regression=# \d+ road
Table "public.road"
Column | Type | Modifiers | Storage | Stats target | Description
---------+------+-----------+----------+--------------+-------------
name | text | | extended | |
thepath | path | | extended | |
Indexes:
"rix" btree (name)
Child tables: ihighway,
shighway
Has OIDs: no
Table "road" has children "ihighway" and "shighway" as seen in the \d+
output above. The EXPLAIN output of Seq Scan node on children has
"public.road" as prefix for variables. "public.road" could imply the parent
table "road" and thus can cause confusion, as to what's been referreed, the
columns of parent table or child table. In the EXPLAIN output children
tables have "road" as alias (as against "public.road"). The alias comes
from RangeTblEntry->eref->aliasname. It might be better to have "road" as
prefix in the variable names over "public.road".
The reason why this happens is the code in get_variable()
3865 /* Exceptions occur only if the RTE is alias-less */
3866 if (rte->alias == NULL)
3867 {
3868 if (rte->rtekind == RTE_RELATION)
3869 {
3870 /*
3871 * It's possible that use of the bare refname would find
another
3872 * more-closely-nested RTE, or be ambiguous, in which case
we need
3873 * to specify the schemaname to avoid these errors.
3874 */
3875 if (find_rte_by_refname(rte->eref->aliasname, context) !=
rte)
3876 schemaname =
get_namespace_name(get_rel_namespace(rte->relid));
3877 }
If there is no alias, we find out the schema name and later add it to the
prefix. In the inherited table case, we are actually creating a "kind of"
alias for the children table and thus we should not find out the schema
name and add it to the prefix. This case has been taken care of in
get_from_clause_item(),
6505 else if (rte->rtekind == RTE_RELATION &&
6506 strcmp(rte->eref->aliasname,
get_relation_name(rte->relid)) != 0)
6507 {
6508 /*
6509 * Apparently the rel has been renamed since the rule was
made.
6510 * Emit a fake alias clause so that variable references
will still
6511 * work. This is not a 100% solution but should work in
most
6512 * reasonable situations.
6513 */
6514 appendStringInfo(buf, " %s",
6515 quote_identifier(rte->eref->aliasname));
6516 gavealias = true;
6517 }
I see similar code in ExplainTargetRel()
1778 if (objectname == NULL ||
1779 strcmp(rte->eref->aliasname, objectname) != 0)
1780 appendStringInfo(es->str, " %s",
1781 quote_identifier(rte->eref->aliasname));
Based on this, here is patch to not add schemaname in the prefix for a
variable.
I have run make check. All except inherit.sql passed. The expected output
change is included in the patch.
--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company
On Wed, Jan 11, 2012 at 5:13 PM, Ashutosh Bapat <
ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> Hi,
> After running regression, I ran EXPLAIN on one of the queries in
> regression (test create_misc) and got following output
> regression=# explain verbose select * into table ramp from road where name
> ~ '.*Ramp';
> QUERY
> PLAN
>
> ------------------------------------------------------------------------------------
> Result (cost=0.00..154.00 rows=841 width=67)
> Output: public.road.name, public.road.thepath
> -> Append (cost=0.00..154.00 rows=841 width=67)
> -> Seq Scan on public.road (cost=0.00..135.05 rows=418 width=67)
> Output: public.road.name, public.road.thepath
> Filter: (public.road.name ~ '.*Ramp'::text)
> -> Seq Scan on public.ihighway road (cost=0.00..14.99 rows=367
> width=67)
> ^^^^^
> Output: public.road.name, public.road.thepath
> ^^^^^^^^^^, ^^^^^^
> Filter: (public.road.name ~ '.*Ramp'::text)
> ^^^^^^^^^^^
> -> Seq Scan on public.shighway road (cost=0.00..3.96 rows=56
> width=67)
> Output: public.road.name, public.road.thepath
> Filter: (public.road.name ~ '.*Ramp'::text)
> (12 rows)
>
> regression=# \d+ road
> Table "public.road"
> Column | Type | Modifiers | Storage | Stats target | Description
> ---------+------+-----------+----------+--------------+-------------
> name | text | | extended | |
> thepath | path | | extended | |
> Indexes:
> "rix" btree (name)
> Child tables: ihighway,
> shighway
> Has OIDs: no
>
> Table "road" has children "ihighway" and "shighway" as seen in the \d+
> output above. The EXPLAIN output of Seq Scan node on children has
> "public.road" as prefix for variables. "public.road" could imply the parent
> table "road" and thus can cause confusion, as to what's been referreed, the
> columns of parent table or child table. In the EXPLAIN output children
> tables have "road" as alias (as against "public.road"). The alias comes
> from RangeTblEntry->eref->aliasname. It might be better to have "road" as
> prefix in the variable names over "public.road".
>
> The reason why this happens is the code in get_variable()
> 3865 /* Exceptions occur only if the RTE is alias-less */
> 3866 if (rte->alias == NULL)
> 3867 {
> 3868 if (rte->rtekind == RTE_RELATION)
> 3869 {
> 3870 /*
> 3871 * It's possible that use of the bare refname would find
> another
> 3872 * more-closely-nested RTE, or be ambiguous, in which
> case we need
> 3873 * to specify the schemaname to avoid these errors.
> 3874 */
> 3875 if (find_rte_by_refname(rte->eref->aliasname, context) !=
> rte)
> 3876 schemaname =
> get_namespace_name(get_rel_namespace(rte->relid));
> 3877 }
>
> If there is no alias, we find out the schema name and later add it to the
> prefix. In the inherited table case, we are actually creating a "kind of"
> alias for the children table and thus we should not find out the schema
> name and add it to the prefix. This case has been taken care of in
> get_from_clause_item(),
> 6505 else if (rte->rtekind == RTE_RELATION &&
> 6506 strcmp(rte->eref->aliasname,
> get_relation_name(rte->relid)) != 0)
> 6507 {
> 6508 /*
> 6509 * Apparently the rel has been renamed since the rule was
> made.
> 6510 * Emit a fake alias clause so that variable references
> will still
> 6511 * work. This is not a 100% solution but should work in
> most
> 6512 * reasonable situations.
> 6513 */
> 6514 appendStringInfo(buf, " %s",
> 6515 quote_identifier(rte->eref->aliasname));
> 6516 gavealias = true;
> 6517 }
>
> I see similar code in ExplainTargetRel()
> 1778 if (objectname == NULL ||
> 1779 strcmp(rte->eref->aliasname, objectname) != 0)
> 1780 appendStringInfo(es->str, " %s",
> 1781 quote_identifier(rte->eref->aliasname));
>
> Based on this, here is patch to not add schemaname in the prefix for a
> variable.
>
> I have run make check. All except inherit.sql passed. The expected output
> change is included in the patch.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Enterprise Postgres Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
A table can inherit from one or more parent table. So in that case,
qualifying schema/table name
helps in finding out where the column is coming from.
Regards,
Chetan
--
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Website: www.enterprisedb.com
EnterpriseDB Blog : http://blogs.enterprisedb.com
Follow us on Twitter : http://www.twitter.com/enterprisedb
On Wed, Jan 11, 2012 at 5:25 PM, Chetan Suttraway <
chetan(dot)suttraway(at)enterprisedb(dot)com> wrote:
>
>
> On Wed, Jan 11, 2012 at 5:13 PM, Ashutosh Bapat <
> ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>
>> Hi,
>> After running regression, I ran EXPLAIN on one of the queries in
>> regression (test create_misc) and got following output
>> regression=# explain verbose select * into table ramp from road where
>> name ~ '.*Ramp';
>> QUERY
>> PLAN
>>
>> ------------------------------------------------------------------------------------
>> Result (cost=0.00..154.00 rows=841 width=67)
>> Output: public.road.name, public.road.thepath
>> -> Append (cost=0.00..154.00 rows=841 width=67)
>> -> Seq Scan on public.road (cost=0.00..135.05 rows=418
>> width=67)
>> Output: public.road.name, public.road.thepath
>> Filter: (public.road.name ~ '.*Ramp'::text)
>> -> Seq Scan on public.ihighway road (cost=0.00..14.99 rows=367
>> width=67)
>> ^^^^^
>> Output: public.road.name, public.road.thepath
>> ^^^^^^^^^^, ^^^^^^
>> Filter: (public.road.name ~ '.*Ramp'::text)
>> ^^^^^^^^^^^
>> -> Seq Scan on public.shighway road (cost=0.00..3.96 rows=56
>> width=67)
>> Output: public.road.name, public.road.thepath
>> Filter: (public.road.name ~ '.*Ramp'::text)
>> (12 rows)
>>
>> regression=# \d+ road
>> Table "public.road"
>> Column | Type | Modifiers | Storage | Stats target | Description
>> ---------+------+-----------+----------+--------------+-------------
>> name | text | | extended | |
>> thepath | path | | extended | |
>> Indexes:
>> "rix" btree (name)
>> Child tables: ihighway,
>> shighway
>> Has OIDs: no
>>
>> Table "road" has children "ihighway" and "shighway" as seen in the \d+
>> output above. The EXPLAIN output of Seq Scan node on children has
>> "public.road" as prefix for variables. "public.road" could imply the parent
>> table "road" and thus can cause confusion, as to what's been referreed, the
>> columns of parent table or child table. In the EXPLAIN output children
>> tables have "road" as alias (as against "public.road"). The alias comes
>> from RangeTblEntry->eref->aliasname. It might be better to have "road" as
>> prefix in the variable names over "public.road".
>>
>> The reason why this happens is the code in get_variable()
>> 3865 /* Exceptions occur only if the RTE is alias-less */
>> 3866 if (rte->alias == NULL)
>> 3867 {
>> 3868 if (rte->rtekind == RTE_RELATION)
>> 3869 {
>> 3870 /*
>> 3871 * It's possible that use of the bare refname would find
>> another
>> 3872 * more-closely-nested RTE, or be ambiguous, in which
>> case we need
>> 3873 * to specify the schemaname to avoid these errors.
>> 3874 */
>> 3875 if (find_rte_by_refname(rte->eref->aliasname, context)
>> != rte)
>> 3876 schemaname =
>> get_namespace_name(get_rel_namespace(rte->relid));
>> 3877 }
>>
>> If there is no alias, we find out the schema name and later add it to the
>> prefix. In the inherited table case, we are actually creating a "kind of"
>> alias for the children table and thus we should not find out the schema
>> name and add it to the prefix. This case has been taken care of in
>> get_from_clause_item(),
>> 6505 else if (rte->rtekind == RTE_RELATION &&
>> 6506 strcmp(rte->eref->aliasname,
>> get_relation_name(rte->relid)) != 0)
>> 6507 {
>> 6508 /*
>> 6509 * Apparently the rel has been renamed since the rule
>> was made.
>> 6510 * Emit a fake alias clause so that variable references
>> will still
>> 6511 * work. This is not a 100% solution but should work in
>> most
>> 6512 * reasonable situations.
>> 6513 */
>> 6514 appendStringInfo(buf, " %s",
>> 6515 quote_identifier(rte->eref->aliasname));
>> 6516 gavealias = true;
>> 6517 }
>>
>> I see similar code in ExplainTargetRel()
>> 1778 if (objectname == NULL ||
>> 1779 strcmp(rte->eref->aliasname, objectname) != 0)
>> 1780 appendStringInfo(es->str, " %s",
>> 1781 quote_identifier(rte->eref->aliasname));
>>
>> Based on this, here is patch to not add schemaname in the prefix for a
>> variable.
>>
>> I have run make check. All except inherit.sql passed. The expected output
>> change is included in the patch.
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EntepriseDB Corporation
>> The Enterprise Postgres Company
>>
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
> A table can inherit from one or more parent table. So in that case,
> qualifying schema/table name
> helps in finding out where the column is coming from.
>
Do you have any example of this case? From the code it does not look like
that. Even if a table is inherited from more than one parent, the aliasname
is set to the name of the parent on which the scan is executed, it will
have only one alias. The case you are talking about can happen in case more
than one of these parents is included in the query. But, in such case there
will be multiple RTEs for same child table each coming from corresponding
parent, and thus will have corresponding aliasname.
>
> Regards,
> Chetan
>
> --
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
> Website: www.enterprisedb.com
> EnterpriseDB Blog : http://blogs.enterprisedb.com
> Follow us on Twitter : http://www.twitter.com/enterprisedb
>
>
>
>
>
--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company
On Wed, Jan 11, 2012 at 6:43 AM, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote: > Hi, > After running regression, I ran EXPLAIN on one of the queries in regression > (test create_misc) and got following output > regression=# explain verbose select * into table ramp from road where name ~ > '.*Ramp'; > QUERY > PLAN > ------------------------------------------------------------------------------------ > Result (cost=0.00..154.00 rows=841 width=67) > Output: public.road.name, public.road.thepath > -> Append (cost=0.00..154.00 rows=841 width=67) > -> Seq Scan on public.road (cost=0.00..135.05 rows=418 width=67) > Output: public.road.name, public.road.thepath > Filter: (public.road.name ~ '.*Ramp'::text) > -> Seq Scan on public.ihighway road (cost=0.00..14.99 rows=367 > width=67) > ^^^^^ > Output: public.road.name, public.road.thepath > ^^^^^^^^^^, ^^^^^^ > Filter: (public.road.name ~ '.*Ramp'::text) > ^^^^^^^^^^^ > -> Seq Scan on public.shighway road (cost=0.00..3.96 rows=56 > width=67) > Output: public.road.name, public.road.thepath > Filter: (public.road.name ~ '.*Ramp'::text) > (12 rows) > > regression=# \d+ road > Table "public.road" > Column | Type | Modifiers | Storage | Stats target | Description > ---------+------+-----------+----------+--------------+------------- > name | text | | extended | | > thepath | path | | extended | | > Indexes: > "rix" btree (name) > Child tables: ihighway, > shighway > Has OIDs: no > > Table "road" has children "ihighway" and "shighway" as seen in the \d+ > output above. The EXPLAIN output of Seq Scan node on children has > "public.road" as prefix for variables. "public.road" could imply the parent > table "road" and thus can cause confusion, as to what's been referreed, the > columns of parent table or child table. In the EXPLAIN output children > tables have "road" as alias (as against "public.road"). The alias comes from > RangeTblEntry->eref->aliasname. It might be better to have "road" as prefix > in the variable names over "public.road". It's a feature, not a bug, that we schema-qualify names when VERBOSE is specified. That was done on purpose for the benefit of external tools that might need this information to disambiguate which object is being referenced. Table *aliases*, of course, should not be schema-qualified, but I don't think that's what we're doing. You could make it more clear by including an alias in the query, like this: explain verbose select * into table ramp from road hwy where name ~ '.*Ramp'; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> It's a feature, not a bug, that we schema-qualify names when VERBOSE
> is specified. That was done on purpose for the benefit of external
> tools that might need this information to disambiguate which object is
> being referenced.
> Table *aliases*, of course, should not be schema-qualified, but I
> don't think that's what we're doing. You could make it more clear by
> including an alias in the query, like this:
> explain verbose select * into table ramp from road hwy where name ~ '.*Ramp';
I think you are both focusing on the wrong thing. There is a lot of
squishiness in what EXPLAIN prints out, since SQL notation is not always
well suited to what an execution plan actually does. But this code has
a hard and fast requirement that it dump view definitions correctly,
else pg_dump doesn't work. And after looking at this I think Ashutosh
has in fact found a bug. Consider this example:
regression=# create schema s1;
CREATE SCHEMA
regression=# create schema s2;
CREATE SCHEMA
regression=# create table s1.t1 (f1 int);
CREATE TABLE
regression=# create table s2.t1 (f1 int);
CREATE TABLE
regression=# create view v1 as
regression-# select * from s1.t1 where exists (
regression(# select 1 from s2.t1 where s2.t1.f1 = s1.t1.f1
regression(# );
CREATE VIEW
regression=# \d+ v1
View "public.v1"
Column | Type | Modifiers | Storage | Description
--------+---------+-----------+---------+-------------
f1 | integer | | plain |
View definition:
SELECT t1.f1
FROM s1.t1
WHERE (EXISTS ( SELECT 1
FROM s2.t1
WHERE t1.f1 = s1.t1.f1));
regression=# alter table s2.t1 rename to tx;
ALTER TABLE
regression=# \d+ v1
View "public.v1"
Column | Type | Modifiers | Storage | Description
--------+---------+-----------+---------+-------------
f1 | integer | | plain |
View definition:
SELECT t1.f1
FROM s1.t1
WHERE (EXISTS ( SELECT 1
FROM s2.tx t1
WHERE t1.f1 = s1.t1.f1));
Both of the above displays of the view are formally correct, in that the
variables will be taken to refer to the correct upper or lower RTE.
But let's change that back and rename the other table:
regression=# alter table s2.tx rename to t1;
ALTER TABLE
regression=# alter table s1.t1 rename to tx;
ALTER TABLE
regression=# \d+ v1
View "public.v1"
Column | Type | Modifiers | Storage | Description
--------+---------+-----------+---------+-------------
f1 | integer | | plain |
View definition:
SELECT t1.f1
FROM s1.tx t1
WHERE (EXISTS ( SELECT 1
FROM s2.t1
WHERE t1.f1 = s1.t1.f1));
This is just plain wrong, as you'll see if you try to execute that
query:
regression=# SELECT t1.f1
regression-# FROM s1.tx t1
regression-# WHERE (EXISTS ( SELECT 1
regression(# FROM s2.t1
regression(# WHERE t1.f1 = s1.t1.f1));
ERROR: invalid reference to FROM-clause entry for table "t1"
LINE 5: WHERE t1.f1 = s1.t1.f1));
^
HINT: There is an entry for table "t1", but it cannot be referenced
from this part of the query.
(The HINT is a bit confused here, but the query is certainly invalid.)
So what we have here is a potential failure to dump and reload view
definitions, which is a lot more critical in my book than whether
EXPLAIN's output is confusing.
If we stick with the existing rule for attaching a fake alias to renamed
RTEs, I think that Ashutosh's patch or something like it is probably
appropriate, because the variable-printing code ought to be in step with
the RTE-printing code. Unfortunately, I think the hack to attach a fake
alias to renamed RTEs creates some issues of its own. Consider
select * from s1.t1
where exists (select 1 from s2.t2 t1 where t1.f1 = s1.t1.f1);
If s1.t1 is now renamed to s1.tx, it is still possible to express
the same semantics:
select * from s1.tx
where exists (select 1 from s2.t2 t1 where t1.f1 = s1.tx.f1);
But when we attach a fake alias, it's broken:
select * from s1.tx t1
where exists (select 1 from s2.t2 t1 where t1.f1 = ?.f1);
There is no way to reference the outer RTE anymore from the subquery,
because the conflicting lower alias masks it.
We may be between a rock and a hard place though, because it's not that
hard to demonstrate cases where not adding a fake alias breaks it too:
select * from s1.t1 tx
where exists (select 1 from s2.t1 where s2.t1.f1 = tx.f1);
If s2.t1 is renamed to s2.tx, there's no longer any way to reference the
upper alias tx, unless you alias the lower RTE to some different name.
I think that when we put in the fake-alias behavior, we made a value
judgment that this type of situation was more common than the other,
but I'm not really sure why.
Maybe what we need to do instead is create totally-made-up, unique
aliases when something like this happens.
regards, tom lane
Thanks Tom for giving a stronger case. I found the problem whille looking at inherited tables, and didn't think beyond inherited tables. Since inherited tables are expanded when subquery planner is invoked, I thought the problem will occur only in Explain output as we won't generate queries, that can be used elsewhere after/during planning. So, as I understand we have two problems here 1. Prefixing schemaname to the fake alises if there is another RTE with same name. There may not be a relation with that name (fake alias name given) in the schema chosen as prefix. 2. Fake aliases themselves can be conflicting. If I understand correctly, if we solve the second problem, first problem will not occur. Is that correct? On Sat, Jan 28, 2012 at 8:08 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote: > Robert Haas <robertmhaas(at)gmail(dot)com> writes: > > It's a feature, not a bug, that we schema-qualify names when VERBOSE > > is specified. That was done on purpose for the benefit of external > > tools that might need this information to disambiguate which object is > > being referenced. > > > Table *aliases*, of course, should not be schema-qualified, but I > > don't think that's what we're doing. You could make it more clear by > > including an alias in the query, like this: > > > explain verbose select * into table ramp from road hwy where name ~ > '.*Ramp'; > > I think you are both focusing on the wrong thing. There is a lot of > squishiness in what EXPLAIN prints out, since SQL notation is not always > well suited to what an execution plan actually does. But this code has > a hard and fast requirement that it dump view definitions correctly, > else pg_dump doesn't work. And after looking at this I think Ashutosh > has in fact found a bug. Consider this example: > > regression=# create schema s1; > CREATE SCHEMA > regression=# create schema s2; > CREATE SCHEMA > regression=# create table s1.t1 (f1 int); > CREATE TABLE > regression=# create table s2.t1 (f1 int); > CREATE TABLE > regression=# create view v1 as > regression-# select * from s1.t1 where exists ( > regression(# select 1 from s2.t1 where s2.t1.f1 = s1.t1.f1 > regression(# ); > CREATE VIEW > regression=# \d+ v1 > View "public.v1" > Column | Type | Modifiers | Storage | Description > --------+---------+-----------+---------+------------- > f1 | integer | | plain | > View definition: > SELECT t1.f1 > FROM s1.t1 > WHERE (EXISTS ( SELECT 1 > FROM s2.t1 > WHERE t1.f1 = s1.t1.f1)); > > regression=# alter table s2.t1 rename to tx; > ALTER TABLE > regression=# \d+ v1 > View "public.v1" > Column | Type | Modifiers | Storage | Description > --------+---------+-----------+---------+------------- > f1 | integer | | plain | > View definition: > SELECT t1.f1 > FROM s1.t1 > WHERE (EXISTS ( SELECT 1 > FROM s2.tx t1 > WHERE t1.f1 = s1.t1.f1)); > > Both of the above displays of the view are formally correct, in that the > variables will be taken to refer to the correct upper or lower RTE. > But let's change that back and rename the other table: > > regression=# alter table s2.tx rename to t1; > ALTER TABLE > regression=# alter table s1.t1 rename to tx; > ALTER TABLE > regression=# \d+ v1 > View "public.v1" > Column | Type | Modifiers | Storage | Description > --------+---------+-----------+---------+------------- > f1 | integer | | plain | > View definition: > SELECT t1.f1 > FROM s1.tx t1 > WHERE (EXISTS ( SELECT 1 > FROM s2.t1 > WHERE t1.f1 = s1.t1.f1)); > > This is just plain wrong, as you'll see if you try to execute that > query: > > regression=# SELECT t1.f1 > regression-# FROM s1.tx t1 > regression-# WHERE (EXISTS ( SELECT 1 > regression(# FROM s2.t1 > regression(# WHERE t1.f1 = s1.t1.f1)); > ERROR: invalid reference to FROM-clause entry for table "t1" > LINE 5: WHERE t1.f1 = s1.t1.f1)); > ^ > HINT: There is an entry for table "t1", but it cannot be referenced > from this part of the query. > > (The HINT is a bit confused here, but the query is certainly invalid.) > > So what we have here is a potential failure to dump and reload view > definitions, which is a lot more critical in my book than whether > EXPLAIN's output is confusing. > > If we stick with the existing rule for attaching a fake alias to renamed > RTEs, I think that Ashutosh's patch or something like it is probably > appropriate, because the variable-printing code ought to be in step with > the RTE-printing code. Unfortunately, I think the hack to attach a fake > alias to renamed RTEs creates some issues of its own. Consider > > select * from s1.t1 > where exists (select 1 from s2.t2 t1 where t1.f1 = s1.t1.f1); > > If s1.t1 is now renamed to s1.tx, it is still possible to express > the same semantics: > > select * from s1.tx > where exists (select 1 from s2.t2 t1 where t1.f1 = s1.tx.f1); > > But when we attach a fake alias, it's broken: > > select * from s1.tx t1 > where exists (select 1 from s2.t2 t1 where t1.f1 = ?.f1); > > There is no way to reference the outer RTE anymore from the subquery, > because the conflicting lower alias masks it. > > We may be between a rock and a hard place though, because it's not that > hard to demonstrate cases where not adding a fake alias breaks it too: > > select * from s1.t1 tx > where exists (select 1 from s2.t1 where s2.t1.f1 = tx.f1); > > If s2.t1 is renamed to s2.tx, there's no longer any way to reference the > upper alias tx, unless you alias the lower RTE to some different name. > I think that when we put in the fake-alias behavior, we made a value > judgment that this type of situation was more common than the other, > but I'm not really sure why. > > Maybe what we need to do instead is create totally-made-up, unique > aliases when something like this happens. > > regards, tom lane > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company
Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes: > So, as I understand we have two problems here > 1. Prefixing schemaname to the fake alises if there is another RTE with > same name. There may not be a relation with that name (fake alias name > given) in the schema chosen as prefix. > 2. Fake aliases themselves can be conflicting. Well, the issue is more that a fake alias might unintentionally collide with a regular alias elsewhere in the query. There's no guard against that in the current behavior, and ISTM there needs to be. The one possibly-simplifying thing about this whole issue is that we needn't cater for references that couldn't have been made in the original query. For instance, if the inner and outer queries both have explicit aliases "tx", it's impossible for the inner query to have referred to any columns of the outer "tx" --- so we don't have to try to make it possible in the dumped form. > If I understand correctly, if we solve the second problem, first problem > will not occur. Is that correct? I don't believe that the problem has anything to do with the names of other tables that might happen to exist in the database. It's a matter of what RTE names/aliases are exposed for variable references in different parts of the query. regards, tom lane
I don't believe that the problem has anything to do with the names of > other tables that might happen to exist in the database. It's a matter > of what RTE names/aliases are exposed for variable references in > different parts of the query. > > Names of other tables come into picture when we schema qualify the fake aliases in the generated query. See examples in first post. > regards, tom lane > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company
Looking at the code, it seems that the fake aliases (eref) for relations (may be views as well) are not generated per say, but they do not get changed when the relation name changes OR in case of inherited tables, they do not get changed when the inheritance is expanded (expand_inherited_rtentry). So, there is not question of generating them so as to not collide with other aliases in the query. However I did not find answers to these questions 1. What is the use of eref in case of relation when the relation name itself can be provided by the reloid? 2. Can we use schema qualified relation name in get_from_clause_item() and get_variable() instead of use eref->aliasname. I have noticed that the logic in get_rte_attribute_name() gives preference to the column names in catalog tables over eref->colnames. Anyone? On Mon, Jan 30, 2012 at 10:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote: > Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes: > > So, as I understand we have two problems here > > 1. Prefixing schemaname to the fake alises if there is another RTE with > > same name. There may not be a relation with that name (fake alias name > > given) in the schema chosen as prefix. > > 2. Fake aliases themselves can be conflicting. > > Well, the issue is more that a fake alias might unintentionally collide > with a regular alias elsewhere in the query. There's no guard against > that in the current behavior, and ISTM there needs to be. > > The one possibly-simplifying thing about this whole issue is that we > needn't cater for references that couldn't have been made in the > original query. For instance, if the inner and outer queries both have > explicit aliases "tx", it's impossible for the inner query to have > referred to any columns of the outer "tx" --- so we don't have to try to > make it possible in the dumped form. > > > If I understand correctly, if we solve the second problem, first problem > > will not occur. Is that correct? > > I don't believe that the problem has anything to do with the names of > other tables that might happen to exist in the database. It's a matter > of what RTE names/aliases are exposed for variable references in > different parts of the query. > > regards, tom lane > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company
Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes: > Looking at the code, it seems that the fake aliases (eref) for relations > (may be views as well) are not generated per say, but they do not get > changed when the relation name changes OR in case of inherited tables, they > do not get changed when the inheritance is expanded > (expand_inherited_rtentry). So, there is not question of generating them so > as to not collide with other aliases in the query. Well, what I was considering was exactly generating new aliases that don't collide with anything else in the query. The fact that the code doesn't do that now doesn't mean we can't make it do that. > However I did not find answers to these questions > 1. What is the use of eref in case of relation when the relation name > itself can be provided by the reloid? eref is stored mainly so that parsing code doesn't have to repeatedly look up what the effective RTE name is. The alias field is meant to represent whether there was an AS clause or not, and if so exactly what it said. So eref is a derived result whereas alias is essentially raw grammar output. Because of the possibility that the relation gets renamed, it's probably best if we don't rely on eref anymore after initial parsing of a query, ie ruleutils.c probably shouldn't use it. (Too lazy to go check right now if that's already true, but it seems like a good goal to pursue if we're going to change this code.) > 2. Can we use schema qualified relation name in get_from_clause_item() and > get_variable() instead of use eref->aliasname. No. If there is an alias, it is flat wrong to use the relation name instead, with or without schema name. You might want to go study the SQL spec a bit in this area. > I have noticed that the > logic in get_rte_attribute_name() gives preference to the column names in > catalog tables over eref->colnames. Hm. What it should probably do is look at alias first, and if the alias field doesn't specify a column name, then go to the catalogs to get the current name. Thinking about this some more, it seems like there are ways for a user to shoot himself in the foot pretty much irretrievably. Consider CREATE TABLE t (x int); CREATE VIEW v AS SELECT y FROM t t(y); ALTER TABLE t ADD COLUMN y int; On dump and reload, we'll have CREATE TABLE t (x int, y int); CREATE VIEW v AS SELECT y FROM t t(y); and now the CREATE VIEW will fail, complaining (correctly) that the column reference "y" is ambiguous. Should ruleutils be expected to take it upon itself to prevent that? We could conceive of "fixing" it by inventing column aliases out of whole cloth: CREATE VIEW v AS SELECT y FROM t t(y, the_other_y); but that seems a little much, not to mention that such a view definition would be horribly confusing to work with. On the other hand it isn't all that far beyond what I had in mind of inventing relation aliases to cure relation-name conflicts. Should we take the existence of such cases as evidence that we shouldn't try hard in this area? It seems reasonable to me to try to handle relation renames but draw the line at disambiguating column names. But others might find that distinction artificial. regards, tom lane
On Wed, Feb 1, 2012 at 12:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote: > On the other hand it isn't > all that far beyond what I had in mind of inventing relation aliases > to cure relation-name conflicts. Should we take the existence of such > cases as evidence that we shouldn't try hard in this area? It seems > reasonable to me to try to handle relation renames but draw the line > at disambiguating column names. But others might find that distinction > artificial. I sure do. I mean, in Oracle, if you rename a table or column involved in a view, then the view breaks. Blammo! The reference is by object name, not by some internal identifier a la OID. If you put back an object with the correct name (either the original one or a different one), you can re-enable the view. We've decide that we don't want that behavior: instead, our references are to the object itself rather than to the name of the object. Renaming the object doesn't change what the reference points to. But given that position, it seems to me that we ought to be willing to work pretty hard to make sure that when we dump-and-reload the database, things stay sane. Otherwise, we're sort of in this unsatisfying in-between place where references are *mostly* by internal identifier but everyone once in a while it falls apart and name collisions can break everything. Yech! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Feb 1, 2012 at 10:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote: > Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes: > > Looking at the code, it seems that the fake aliases (eref) for relations > > (may be views as well) are not generated per say, but they do not get > > changed when the relation name changes OR in case of inherited tables, > they > > do not get changed when the inheritance is expanded > > (expand_inherited_rtentry). So, there is not question of generating them > so > > as to not collide with other aliases in the query. > > Well, what I was considering was exactly generating new aliases that > don't collide with anything else in the query. The fact that the code > doesn't do that now doesn't mean we can't make it do that. > > > However I did not find answers to these questions > > 1. What is the use of eref in case of relation when the relation name > > itself can be provided by the reloid? > > eref is stored mainly so that parsing code doesn't have to repeatedly > look up what the effective RTE name is. The alias field is meant to > represent whether there was an AS clause or not, and if so exactly what > it said. So eref is a derived result whereas alias is essentially raw > grammar output. Because of the possibility that the relation gets > renamed, it's probably best if we don't rely on eref anymore after > initial parsing of a query, ie ruleutils.c probably shouldn't use it. > (Too lazy to go check right now if that's already true, but it seems > like a good goal to pursue if we're going to change this code.) > > > 2. Can we use schema qualified relation name in get_from_clause_item() > and > > get_variable() instead of use eref->aliasname. > > No. If there is an alias, it is flat wrong to use the relation name > instead, with or without schema name. You might want to go study the > SQL spec a bit in this area. > To clarify matters a bit, item 2 is in conjunction with item 1. Aliases, if provided, are output irrespective of whether we get the relation name from eref or catalogs. ruleutils should just ignore eref (for RTE_RELATION only) and get the relation name from given OID. > > > I have noticed that the > > logic in get_rte_attribute_name() gives preference to the column names in > > catalog tables over eref->colnames. > > Hm. What it should probably do is look at alias first, and if the alias > field doesn't specify a column name, then go to the catalogs to get the > current name. > It does give preference to aliases today. I compared preferences of colnames in eref and that obtained from catalogs. > > > Thinking about this some more, it seems like there are ways for a user > to shoot himself in the foot pretty much irretrievably. Consider > > CREATE TABLE t (x int); > CREATE VIEW v AS SELECT y FROM t t(y); > ALTER TABLE t ADD COLUMN y int; > > On dump and reload, we'll have > > CREATE TABLE t (x int, y int); > CREATE VIEW v AS SELECT y FROM t t(y); > > and now the CREATE VIEW will fail, complaining (correctly) that the > column reference "y" is ambiguous. Should ruleutils be expected to take > it upon itself to prevent that? We could conceive of "fixing" it by > inventing column aliases out of whole cloth: > > CREATE VIEW v AS SELECT y FROM t t(y, the_other_y); > > but that seems a little much, not to mention that such a view definition > would be horribly confusing to work with. On the other hand it isn't > all that far beyond what I had in mind of inventing relation aliases > to cure relation-name conflicts. Should we take the existence of such > cases as evidence that we shouldn't try hard in this area? It seems > reasonable to me to try to handle relation renames but draw the line > at disambiguating column names. But others might find that distinction > artificial. > I agree. The example of the colnames was only to show that the preference alias > relation information from catalogs > eref exists somewhere in the code. > > regards, tom lane > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company
On Wed, Feb 1, 2012 at 11:01 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote: > On Wed, Feb 1, 2012 at 12:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote: > > On the other hand it isn't > > all that far beyond what I had in mind of inventing relation aliases > > to cure relation-name conflicts. Should we take the existence of such > > cases as evidence that we shouldn't try hard in this area? It seems > > reasonable to me to try to handle relation renames but draw the line > > at disambiguating column names. But others might find that distinction > > artificial. > > I sure do. > > I mean, in Oracle, if you rename a table or column involved in a view, > then the view breaks. Blammo! The reference is by object name, not > by some internal identifier a la OID. If you put back an object with > the correct name (either the original one or a different one), you can > re-enable the view. > > We've decide that we don't want that behavior: instead, our references > are to the object itself rather than to the name of the object. > Renaming the object doesn't change what the reference points to. But > given that position, it seems to me that we ought to be willing to > work pretty hard to make sure that when we dump-and-reload the > database, things stay sane. Otherwise, we're sort of in this > unsatisfying in-between place where references are *mostly* by > internal identifier but everyone once in a while it falls apart and > name collisions can break everything. Yech! > > For me the relation names problem and column aliases problems are two independent problems. While the first one looks easy to fix, the other problem may be hard to solve. We can solve the first problem and things will be "better" than what we have today. If you agree, I will provide a patch to fix the relation names problems by ignoring the eref (for RTE_RELATION only) in ruleutils. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company
I got interested in this problem again now that we have a user complaint about it (bug #7553). Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes: > On Wed, Feb 1, 2012 at 11:01 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote: >> On Wed, Feb 1, 2012 at 12:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote: >>> ... It seems >>> reasonable to me to try to handle relation renames but draw the line >>> at disambiguating column names. But others might find that distinction >>> artificial. >> I sure do. > For me the relation names problem and column aliases problems are two > independent problems. I think they are independent problems, and I also think that people are far less likely to trip over column-name problems in practice. Columns of a table are not independent objects and so people aren't so likely to think they can just rename them freely. Moreover, if you rename columns that are used in views, you can get breakage of things like USING or NATURAL joins, and that is something we *cannot* provide a workaround for --- it's a failure inherent in the language definition. As far as the relation-rename problem goes, I propose that what we should do is have ruleutils.c invent nonconflicting fake aliases for each RTE in the query tree. This would allow getting rid of some of the dubious heuristics in get_variable: it should just print the chosen alias and be done. (It still has to do something different for unnamed joins, but we can leave that part alone I think.) We can do this as follows: 1. If there's a user-assigned alias, use that. (It's possible this is not unique within the query, but that's okay because any actual variable reference must be to the most closely nested such RTE.) 2. Otherwise, if the relation's current name doesn't conflict with any previously-assigned alias, use that. 3. Otherwise, append something (underscore and some digits probably) to the relation's current name to construct a string not matching any previously-assigned alias. This might result in printouts that are a bit uglier than the old way in such cases, but anybody who's offended can select their own aliases. regards, tom lane
I wrote: > I got interested in this problem again now that we have a user complaint > about it (bug #7553). > ... > As far as the relation-rename problem goes, I propose that what we > should do is have ruleutils.c invent nonconflicting fake aliases for > each RTE in the query tree. This would allow getting rid of some of the > dubious heuristics in get_variable: it should just print the chosen > alias and be done. (It still has to do something different for unnamed > joins, but we can leave that part alone I think.) Attached is a draft patch for this. It fixes the view-dumping problems that I exhibited in http://archives.postgresql.org/message-id/29791(dot)1327718297(at)sss(dot)pgh(dot)pa(dot)us as well as nicely cleaning up Ashutosh's original complaint at http://archives.postgresql.org/pgsql-hackers/2012-01/msg00505.php There are quite a few more changes in the regression-test plan printouts than was originally discussed, but they seem to be generally for the better IMO: for instance there is no longer any problem with different RTEs being printed with identical names in EXPLAIN. One thing I found while working on this is that some of the inconsistency is not really EXPLAIN's fault but the planner's: the planner does not take any trouble to avoid duplicate RTE aliases when it manufactures additional RTEs, which it does in at least two places (inheritance expansion and min/max aggregate optimization). In my first version of the patch I was getting EXPLAIN printouts like this for inheritance append-plans: Nested Loop -> Limit -> Seq Scan on int4_tbl -> Append ! -> Index Scan using patest0i on patest0 patest0_1 Index Cond: (id = int4_tbl.f1) ! -> Index Scan using patest1i on patest1 Index Cond: (id = int4_tbl.f1) ! -> Index Scan using patest2i on patest2 Index Cond: (id = int4_tbl.f1) That happened because the original inheritance-root RTE got the "patest0" alias, and then the inheritance-child RTE for the parent relation got stuck with "patest0_1". This isn't terribly desirable since the inheritance-root RTE isn't actually visible anywhere in the EXPLAIN printout, so giving it the preferred name isn't ideal. In the attached I've hacked around this by causing the planner to assign new aliases to RTEs that it replaces in this way (see planagg.c and prepunion.c diffs). This seems like a bit of a kluge, but it doesn't take much code. An alternative that I'm considering is to have EXPLAIN make a pre-pass over the plan tree to identify which RTEs will actually be referenced, and then consider only those RTEs while assigning aliases. This would be a great deal more code though, and code which would require maintenance every time we add plan node types etc. So I'm not sure it's really a better answer. Thoughts? regards, tom lane
I wrote: > ... In the attached I've hacked around this by causing the planner to > assign new aliases to RTEs that it replaces in this way (see planagg.c > and prepunion.c diffs). This seems like a bit of a kluge, but it > doesn't take much code. An alternative that I'm considering is to > have EXPLAIN make a pre-pass over the plan tree to identify which > RTEs will actually be referenced, and then consider only those RTEs > while assigning aliases. This would be a great deal more code though, > and code which would require maintenance every time we add plan node > types etc. So I'm not sure it's really a better answer. Thoughts? Attached is a second draft that does it like that. This adds about 130 lines to explain.c compared to the other way, but on reflection it's probably a better solution compared to trying to kluge things in the planner. The change in the select_views results shows that there's at least one other case of duplicated RTE names that I'd not covered with the two planner kluges. I think the next question is whether we want to back-patch this. Although the problem with incorrect view dumping is arguably a data integrity issue (cf bug #7553), few enough people have hit it that I'm not sure it's worth taking risks for. I'd feel better about this code once it'd got through a beta test cycle. Comments? regards, tom lane