Re: UNION and pg_restore

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bryan Lee Nuse <nuse(at)uga(dot)edu>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: UNION and pg_restore
Date: 2012-12-21 22:37:00
Message-ID: 10880.1356129420@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Bryan Lee Nuse <nuse(at)uga(dot)edu> writes:
> My question is, then, how is it that the query embodied in "view_1" below executes fine, but cannot seem to be restored? Is this telling me my query is dumb? If so, any advice on how to easily derive "view_1" from "tab_1" and "tab_2" below, without baffling pg_restore, would be welcome.

The core issue is that you've got

SELECT * FROM (SELECT (-999), (-999)) b;

If you execute this on its own, you'll get

?column? | ?column?
----------+----------
-999 | -999
(1 row)

that is, both columns have been assigned the arbitrary name "?column?".
That's okay in itself, because the "*" is expanded positionally and
doesn't care whether the columns have distinct names or not. However,
when the view definition is dumped out, you have

> View definition:
> SELECT tab_1.id_1, tab_1.id_2, tab_1.id_3, tab_1.data_1, tab_1.data_2
> FROM tab_1
> UNION
> SELECT a.id_1, a.id_2, a.id_3, b."?column?" AS data_1, b."?column?" AS data_2
> FROM ( SELECT tab_2.id_1, tab_2.id_2, tab_2.id_3
> FROM tab_2) a
> CROSS JOIN ( SELECT (-999), (-999)) b;

and now there are ambiguous references to b."?column?" in there.

The short answer to this is that you ought to take some care that the
columns of the sub-select have distinct names. You could do it in
at least two ways:

(SELECT (-999), (-999)) b(x,y)
(SELECT (-999) AS x, (-999) AS y) b

Now you're doubtless wondering why Postgres doesn't dodge this ambiguity
for you. One way would be to dump out the view using the "*" notation,
but it turns out that that is contrary to SQL standard: "*" must be
expanded at parse time. (The place in the standard that mandates this
is a little note under ALTER TABLE ADD COLUMN that says that adding a
column doesn't affect the meaning of any previously defined views.
Hence "SELECT * FROM foo" has to be interpreted as expanding the "*"
immediately, lest its meaning change due to subsequent addition of a
column to foo.)

The other thing we might do is to generate distinct names for unnamed
columns. That would probably be a good idea and maybe we'll do it
sooner or later, but it's not required by (recent versions of) the SQL
standard. It would in any case not eliminate the risk entirely, because
this is perfectly legal SQL:

SELECT * FROM (SELECT (-999) AS x, (-999) AS x) b

but once the "*" has been expanded there is actually no spec-compliant
representation of this query. So there's always going to be some "don't
do that" aspect here.

There are some other fun aspects of trying to ensure that dumped views
can be reloaded. In particular, the possibility that columns of
underlying tables can be renamed introduces all sorts of hazards :-(.
There's been some discussion of how we might make that safer, but a
bulletproof solution seems to require introducing nonstandard syntax
for things like JOIN USING. People haven't been too excited about that.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris Angelico 2012-12-21 22:39:04 Re: Coalesce bug ?
Previous Message Adrian Klaver 2012-12-21 22:31:06 Re: Coalesce bug ?