Collations versus record-returning functions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Collations versus record-returning functions
Date: 2011-03-19 02:40:53
Message-ID: 28909.1300502453@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm making pretty good progress on the task of splitting input and
output collations for expression nodes. There remains one case in the
regression tests that is giving a non-expected result. It involves this
function:

CREATE FUNCTION dup (f1 anyelement, f2 out anyelement, f3 out anyarray)
AS 'select $1, array[$1,$1]' LANGUAGE sql;

which is used like this:

SELECT a, (dup(b)).* FROM collate_test3 ORDER BY 2;

where the column collate_test3.b is declared with a nondefault
collation. In HEAD, the output of dup() inherits that collation from
the b Var, and then so does the field selected from it, and so the ORDER
BY sorts according to the column collation. With my patch as it stands,
you get sorting by the default collation. The reason is that the output
of dup() is of type RECORD, which is a noncollatable type according to
pg_type, so the function result is labeled noncollatable even though it
has collatable inputs. The FieldSelect node then has to choose the
default collation since there's nothing else available.

ISTM there are basically three things we might do about this:

(1) Decide that the patch's behavior is correct and what's embodied in
the regression expected file is wrong.

(2) Allow collations to propagate up through nodes that deliver
noncollatable outputs.

(3) Decide that composite types are collatable.

I don't much care for #2, mainly because it seems likely to cause a lot
of unnecessary collation conflicts. It has a lot of flavor of action
at a distance in most cases, too, even though it might seem natural
in this particular example.

#3 has some appeal but it also seems fraught with definitional issues,
for instance:

* Something like
row('a' collate "C", 'b' collate "en_US")
doesn't have a resolvable collation, but I don't think we want it
to throw an error.

* If composites are collatable, one would logically expect a table's
rowtype to inherit the common collation of its columns (if there is
one). This is problematic because then the rowtype collation can change
after ALTER TABLE ADD COLUMN/DROP COLUMN/ALTER COLUMN TYPE, which is
something we can't support even if it seemed like a good idea.

Also, I fail to see any support for #3 in the SQL standard. So far as
I can find, the standard only considers that string types are collatable.
By the same token, it doesn't look to me like the standard is expecting
#2 to happen, since only string types can carry collation.

Thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-03-19 03:00:19 Re: [COMMITTERS] pgsql: Document the all-balls IPv6 address.
Previous Message Robert Haas 2011-03-19 02:28:33 Re: Sync Rep v19