Re: Open issues for collations

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: Open issues for collations
Date: 2011-04-08 17:57:21
Message-ID: 15270.1302285441@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Reading through this thread...
> On Sat, Mar 26, 2011 at 12:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ** Selecting a field from a record-returning function's output.
>> Currently, we'll use the field's declared collation; except that
>> if the field has default collation, we'll replace that with the common
>> collation of the function's inputs, if any. Is either part of that
>> sane? Do we need to make this work for functions invoked with other
>> syntax than a plain function call, eg operator or cast syntax?

> There were a couple of different ideas about which way we ought to go
> with this, but I'm happy to defer to what Tom and Martijn hashed out:

> MO> That seems all a bit weird. I spent some time reading through the SQL
> MO> spec to see if I could came up with a few ideas about what they thought
> MO> relevent. I think the gist of it is that I think the result row should
> MO> have for each column its declared collation in all cases.

> TL> That interpretation would be fine with me. It would let us get rid of
> TL> the special-case code at lines 307-324 of parse_collate.c, which I put
> TL> in only because there are cases in the collate.linux.utf8.sql regression
> TL> test that fail without it. But I'm perfectly happy to conclude that
> TL> those test cases are mistaken.

> I'm not sure whether that's been done, though, or whether we're even
> going to do it.

I looked a bit more closely at this, and I think I finally get the point
of what those regression test cases involving the dup() function are
about. Consider a trivial polymorphic function such as

create function dummy(anyelement) returns anyelement as
'select $1' language sql;

When applied to a textual argument, this is a function taking and
returning string, and so collation does (and should, I think) propagate
through it. Thus in

select dummy(x) from tbl order by 1;

you will get ordering by the declared collation of tbl.x, whatever that
is. But now consider

create function dup(in anyelement, a out anyelement, b out anyelement)
as 'select $1, $2' language sql;

select dup(x).a from tbl order by 1;

It's not unreasonable to think that this should also order by tbl.x's
collation --- if collation propagates through dummy(), why not through
dup()? And in fact those regression test cases are expecting that it
does propagate in such a case.

Now the discussion that we had earlier in this thread was implicitly
assuming that we were talking about FieldSelect from a known composite
type. If dup() were declared to return a named composite type, then
using the collation that is declared for that type's "a" column seems
reasonable. But when you're dealing with an anonymous record type,
which is what dup() actually returns here, there is no such declaration;
and what's more, the fact that there's a record type at all is just an
implementation detail to most users.

If we take out the kluge in parse_collate.c's handling of FieldSelects,
then what we will get in this example is ordering by the database
default collation. We can justify that on a narrow language-lawyering
basis by saying "dup() returns a composite type, which has no collation,
therefore collation does not propagate through from its arguments to
any column you might select from its result". But it's going to feel
a bit surprising to anyone who thinks of this in terms of OUT arguments
rather than an anonymous composite type.

I'm inclined to think that we should take out the kluge and rest on the
language-lawyering viewpoint, because otherwise there are going to be
umpteen other corner cases where people are going to expect collation to
propagate and it's not going to work without very major kluging.

Comments?

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-04-08 18:08:17 sync rep and smart shutdown
Previous Message Bruce Momjian 2011-04-08 17:35:33 Re: pg_upgrade bug found!