Re: Function Column Expansion Causes Inserts To Fail

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: Function Column Expansion Causes Inserts To Fail
Date: 2011-05-31 19:15:11
Message-ID: BANLkTinRHbTzLEj+Z-Admrupa7jMPRRJ9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, May 31, 2011 at 11:57 AM, David Johnston <polobo(at)yahoo(dot)com> wrote:
> See my thoughts below.  Other user's opinions (or a pointer to where this
> topic has been previously discussed) are greatly welcomed.
> Thank you for the technical detail on how  ().* gets expanded by the engine.
> I still believe it would make sense to disallow VOLATILE functions to have
> the duplicate behavior performed by default - with probably an override
> during function creation.  Backwards compatibility could introduce notices
> and have server configurations to restore prior behavior.  The rewriter
> should know that the composite/record type in the select list is a function
> as opposed to an actual type.  Even if  you document the behavior unless you
> make the runtime engine comply as well this is very subtle (and invisible)
> behavior for the end-user.

There have been multiple complaints about this in the archives. In
the old days, you would have to rewrite your query to use the 'select
* from func()' form (which isn't always so easy) or use a subquery and
the 'offset 0' hack. Running in to this problem has actually become
more common as our type system has gotten fancier and plpgsql got the
ability to be called with the column list, aka select func(), syntax.

The community has had to endure multiple sanctimonious rants about
this by yours truly. Unfortunately complaints are cheap relative to
the hard work and consensus building it would require to fix this
problem. Here's a kinda sorta related thread (read the whole thread)
about it where I was trying to work a solution in somehow:

http://archives.postgresql.org/pgsql-hackers/2010-05/msg00333.php

> I may be overreacting here but calling a function multiple times when the
> query only says to call it once is unexpected behavior and arguably results
> in an incompatible query relative to what was expected.  I can see how such
> behavior can be desirable and benign but the user should have to explicitly
> request/allow such behavior as opposed to having it given to them by
> default.  That way, at least during the decision making process of turning
> on the feature the user can be reasonable expected to view the relevant
> documentation to learn why such explicit permission is required.

The idea that you should be able to control how the function 'hooks
in' to the outer query in terms of inputs and outputs is almost
certainly not going to fly. In this particular case I think you're
best off trying to prove that the current behavior is not
intentionally relied on by anybody and should be changed. If you
could do that, as noted in the email above, you might be able to argue
that (func()).* is a special case and takes on a unique meaning, as
would, possibly, select (foo).* from foo;

> Now, for actual types this is obviously not an issue.  If you could output
> the function result into an actual type and the simply duplicate the type
> with the relevant column specification that would obviously avoid the entire
> problem.  I am guessing, from your response, that this is not that easy of a
> solution to implement.  Now, if the rewriter could generate something like
> the following:
>
> SELECT aux1, aux2, (createpkrecord(sub)).* FROM generate_series(1,10) sub
>
> [REWRITE] (remove the ().* construct and add AS resultfunction1; make the
> resultant query a sub-query and copy all the non-function columns to the
> parent and also expand functionresult1.* as necessary

IMO, That's not gonna fly either -- you are oversimplifying. Suppose
you wrapped the above query into a create view statement, what would
it look like?

> SELECT aux1, aux2, functionresult1.col1, functionresult1.col2
> FROM (SELECT aux1, aux2, createpkrecord(sub) AS functionresult1 FROM
> generate_series(1,10) sub)

This is also a problem: postgresql can (and does) inline the subquery
(unless you use offset 0), defeating the intended purpose of what you
are trying to do. So this is not the same as a CTE, which has a rigid
mechanism of order of execution (although I wonder even relying on
that is future-proof).

> Documentation should not be a substitute for bug fixing.  If you really want
> to say "don't do that" it is better if you do so through the database engine
> itself and not the mailing list.

Well, surprising behaviors != bug. The current behavior actually
works very well for all other contexts than functions (for example
when creating views). In lieu of a sneaky fix like I mentioned
above, maybe you could get some buy-in on a warning though.

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2011-05-31 19:59:53 Re: Some clarification about TIMESTAMP
Previous Message Scott Marlowe 2011-05-31 19:13:14 Re: Some clarification about TIMESTAMP