Re: Assignment of valid collation for SET operations on queries with UNKNOWN types.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Rahila Syed <rahilasyed90(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Assignment of valid collation for SET operations on queries with UNKNOWN types.
Date: 2017-01-08 01:55:42
Message-ID: 11695.1483840542@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes:
> On Tue, Jan 3, 2017 at 5:57 PM, Rahila Syed <rahilasyed90(at)gmail(dot)com> wrote:
>> Are you suggesting extending the patch to include coercing from unknown to
>> text for all possible cases where a column of unknown type is being created?

> I guess, that's what Tom is suggesting.

Yes; I think the point is we should change the semantics we assume for
"SELECT 'foo'", not only for views but across the board. There are plenty
of non-view-related cases where it doesn't behave well, for example

regression=# select * from
(select 'foo' from generate_series(1,3)) ss order by 1;
ERROR: failed to find conversion function from unknown to text

Furthermore, it seems like a seriously bad idea for "SELECT 'foo'" to mean
something different in the context of CREATE VIEW than it means elsewhere.

The trick is to not break cases where we've already hacked things to allow
external influence on the resolved types of SELECT-list items. AFAICT,
these cases are just INSERT/SELECT and set operations (eg unions):

regression=# create table target (f1 int);
CREATE TABLE
regression=# explain verbose insert into target select '42' from generate_series(1,3);
QUERY PLAN

--------------------------------------------------------------------------------
---------
Insert on public.target (cost=0.00..10.00 rows=1000 width=4)
-> Function Scan on pg_catalog.generate_series (cost=0.00..10.00 rows=1000
width=4)
Output: 42
Function Call: generate_series(1, 3)
(4 rows)

regression=# explain verbose select '42' union all select 43;
QUERY PLAN
------------------------------------------------
Append (cost=0.00..0.04 rows=2 width=4)
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: 42
-> Result (cost=0.00..0.01 rows=1 width=4)
Output: 43
(5 rows)

In both of the above cases, we're getting an integer constant not a
text or "unknown" constant, because the type information gets imported
from outside the "SELECT".

I spent some time fooling with this today and came up with the attached
patch. I think this is basically the direction we should go in, but
there are various loose ends:

1. I adjusted a couple of existing regression tests whose results are
affected, but didn't do anything towards adding new tests showing the
desirable results of this change (as per the examples in this thread).

2. I didn't do anything about docs, either, though maybe no change
is needed. One user-visible change from this is that queries should
never return any "unknown"-type columns to clients anymore. But I
think that is not documented now, so maybe there's nothing to change.

3. We need to look at whether pg_upgrade is affected. I think we
might be able to let it just ignore the issue for views, since they'd
get properly updated during the dump and reload anyway. But if someone
had an actual table (or matview) with an "unknown" column, that would
be a pg_upgrade hazard, because "unknown" doesn't store like "text".

4. If "unknown" were marked as a pseudotype not base type in pg_type
(ie, typtype "p" not "b") then the special case for it in
CheckAttributeType could go away completely. I'm not really sure
why it's "b" today --- maybe specifically because of this point that
it's currently possible to create tables with unknown columns? But
I've not looked at what all the implications of changing that might
be, and anyway it could be left for a followon patch.

5. I noticed that the "resolveUnknown" arguments of transformSortClause
and other functions in parse_clause.c could go away: there's really no
reason not to just treat them as "true" always. But that could be
separate cleanup as well.

Anybody want to hack on those loose ends?

regards, tom lane

Attachment Content-Type Size
no-more-UNKNOWN-output-columns-1.patch text/x-diff 16.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2017-01-08 02:27:15 Re: merging some features from plpgsql2 project
Previous Message Jim Nasby 2017-01-08 01:52:13 Allow controlling location of tmp_install