Re: Assignment of valid collation for SET operations on queries with UNKNOWN types.

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Assignment of valid collation for SET operations on queries with UNKNOWN types.
Date: 2017-01-18 05:40:12
Message-ID: CAFjFpRcuGvwFAESRHFe1wi+b98g-BYZVqAwgUi2jwzrC=Q9Sxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 18, 2017 at 10:55 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Sun, Jan 8, 2017 at 10:55 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes:
>>> On Tue, Jan 3, 2017 at 5:57 PM, Rahila Syed <rahilasyed90(at)gmail(dot)com> wrote:
>>>> Are you suggesting extending the patch to include coercing from unknown to
>>>> text for all possible cases where a column of unknown type is being created?
>>
>>> I guess, that's what Tom is suggesting.
>>
>> Yes; I think the point is we should change the semantics we assume for
>> "SELECT 'foo'", not only for views but across the board. There are plenty
>> of non-view-related cases where it doesn't behave well, for example
>>
>> regression=# select * from
>> (select 'foo' from generate_series(1,3)) ss order by 1;
>> ERROR: failed to find conversion function from unknown to text
>>
>> Furthermore, it seems like a seriously bad idea for "SELECT 'foo'" to mean
>> something different in the context of CREATE VIEW than it means elsewhere.
>
> And this offers the same semantics for any DDL using SELECT's target
> list to build the list of column's types, which is consistent.
>
>> The trick is to not break cases where we've already hacked things to allow
>> external influence on the resolved types of SELECT-list items. AFAICT,
>> these cases are just INSERT/SELECT and set operations (eg unions):
>>
>> regression=# create table target (f1 int);
>> CREATE TABLE
>> regression=# explain verbose insert into target select '42' from generate_series(1,3);
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------
>> ---------
>> Insert on public.target (cost=0.00..10.00 rows=1000 width=4)
>> -> Function Scan on pg_catalog.generate_series (cost=0.00..10.00 rows=1000
>> width=4)
>> Output: 42
>> Function Call: generate_series(1, 3)
>> (4 rows)
>>
>> regression=# explain verbose select '42' union all select 43;
>> QUERY PLAN
>> ------------------------------------------------
>> Append (cost=0.00..0.04 rows=2 width=4)
>> -> Result (cost=0.00..0.01 rows=1 width=4)
>> Output: 42
>> -> Result (cost=0.00..0.01 rows=1 width=4)
>> Output: 43
>> (5 rows)
>>
>> In both of the above cases, we're getting an integer constant not a
>> text or "unknown" constant, because the type information gets imported
>> from outside the "SELECT".
>>
>> I spent some time fooling with this today and came up with the attached
>> patch. I think this is basically the direction we should go in, but
>> there are various loose ends:
>>
>> 1. I adjusted a couple of existing regression tests whose results are
>> affected, but didn't do anything towards adding new tests showing the
>> desirable results of this change (as per the examples in this thread).
>>
>> 2. I didn't do anything about docs, either, though maybe no change
>> is needed. One user-visible change from this is that queries should
>> never return any "unknown"-type columns to clients anymore. But I
>> think that is not documented now, so maybe there's nothing to change.
>>
>> 3. We need to look at whether pg_upgrade is affected. I think we
>> might be able to let it just ignore the issue for views, since they'd
>> get properly updated during the dump and reload anyway. But if someone
>> had an actual table (or matview) with an "unknown" column, that would
>> be a pg_upgrade hazard, because "unknown" doesn't store like "text".
>>
>> 4. If "unknown" were marked as a pseudotype not base type in pg_type
>> (ie, typtype "p" not "b") then the special case for it in
>> CheckAttributeType could go away completely. I'm not really sure
>> why it's "b" today --- maybe specifically because of this point that
>> it's currently possible to create tables with unknown columns? But
>> I've not looked at what all the implications of changing that might
>> be, and anyway it could be left for a followon patch.
>>
>> 5. I noticed that the "resolveUnknown" arguments of transformSortClause
>> and other functions in parse_clause.c could go away: there's really no
>> reason not to just treat them as "true" always. But that could be
>> separate cleanup as well.
>>
>> Anybody want to hack on those loose ends?
>
> Ashutosh, Rahila, do you have plans to review things here?

I won't be able to work on creating patches for TODOs listed by Tom.
But I can review if someone volunteers to produce the patches.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-01-18 05:41:03 Re: [WIP] RE: DECLARE STATEMENT setting up a connection in ECPG
Previous Message Michael Paquier 2017-01-18 05:30:38 Re: Password identifiers, protocol aging and SCRAM protocol