Re: Function Column Expansion Causes Inserts To Fail

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Merlin Moncure'" <mmoncure(at)gmail(dot)com>
Cc: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Function Column Expansion Causes Inserts To Fail
Date: 2011-05-31 16:57:51
Message-ID: 00c301cc1fb3$e3c61b80$ab525280$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

See my thoughts below. Other user's opinions (or a pointer to where this
topic has been previously discussed) are greatly welcomed.

> -----Original Message-----
> From: Merlin Moncure [mailto:mmoncure(at)gmail(dot)com]
> Sent: Tuesday, May 31, 2011 11:56 AM
> To: David Johnston
> Cc: Tom Lane; pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] Function Column Expansion Causes Inserts To Fail
>
> The basic issue is that:
> select (func()).*, if the return type has fields, 'a', 'b', gets expanded
to:
>
> select (func()).a, (func()).b;
>
> This is a *huge* gotcha with type returning functions -- in many cases
people
> only notice the problem indirectly through slow performance.
> I've griped about this many times but it's not clear if there's a solution
other
> than to document and advise workarounds. Typically the safest way to deal
> with this is through use of CTE:
>
> > SELECT createpkrecord(sub)
> > FROM (SELECT sub FROM generate_series(1, 10) sub ) src;
>
> becomes
>
> with list as (SELECT createpkrecord(sub) as c FROM generate_series(1, 10)
> sub ) select (c).* from list;
>
> merlin

Thank you for the technical detail on how ().* gets expanded by the engine.
I still believe it would make sense to disallow VOLATILE functions to have
the duplicate behavior performed by default - with probably an override
during function creation. Backwards compatibility could introduce notices
and have server configurations to restore prior behavior. The rewriter
should know that the composite/record type in the select list is a function
as opposed to an actual type. Even if you document the behavior unless you
make the runtime engine comply as well this is very subtle (and invisible)
behavior for the end-user.

I may be overreacting here but calling a function multiple times when the
query only says to call it once is unexpected behavior and arguably results
in an incompatible query relative to what was expected. I can see how such
behavior can be desirable and benign but the user should have to explicitly
request/allow such behavior as opposed to having it given to them by
default. That way, at least during the decision making process of turning
on the feature the user can be reasonable expected to view the relevant
documentation to learn why such explicit permission is required.

Now, for actual types this is obviously not an issue. If you could output
the function result into an actual type and the simply duplicate the type
with the relevant column specification that would obviously avoid the entire
problem. I am guessing, from your response, that this is not that easy of a
solution to implement. Now, if the rewriter could generate something like
the following:

SELECT aux1, aux2, (createpkrecord(sub)).* FROM generate_series(1,10) sub

[REWRITE] (remove the ().* construct and add AS resultfunction1; make the
resultant query a sub-query and copy all the non-function columns to the
parent and also expand functionresult1.* as necessary

SELECT aux1, aux2, functionresult1.col1, functionresult1.col2
FROM (SELECT aux1, aux2, createpkrecord(sub) AS functionresult1 FROM
generate_series(1,10) sub)

That would be semantically and functionally equivalent; and obviously the
resultant query works since that it the function workaround and is also what
happens when you use a CTE.

Again, I don't mind using the more verbose syntax but I'd rather have the
system disallow the broken syntax outright since it changing the declared
behavior of the query. It's hard for me to make a risk/cost/benefit
analysis on the issue but from a pure theory stand-point IMHO the behavior
is contrary to the promise of the database engine to not change the meaning
of the query that it is given. Making it work as expected would be nice but
otherwise steps should be taken to stop multiple function calls unless the
function says it is safe to do so.

Documentation should not be a substitute for bug fixing. If you really want
to say "don't do that" it is better if you do so through the database engine
itself and not the mailing list.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2011-05-31 17:42:30 Re: "postgresql-9.0-801.jdbc4.jar" always cause "org.postgresql.util.PSQLException: Cannot commit when autoCommit is enabled" Exception
Previous Message Emi Lu 2011-05-31 16:51:35 "postgresql-9.0-801.jdbc4.jar" always cause "org.postgresql.util.PSQLException: Cannot commit when autoCommit is enabled" Exception