Re: Insufficient description in collation mismatch error

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thom Brown <thom(at)linux(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Insufficient description in collation mismatch error
Date: 2011-04-20 11:19:57
Message-ID: BANLkTikk=8LeXj3SxwVCUADvYJ8MWUczZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 20, 2011 at 1:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> postgres=# SELECT things, count(*) FROM stuff GROUP BY things COLLATE "C";
>> ERROR:  column "stuff.things" must appear in the GROUP BY clause or be
>> used in an aggregate function
>> LINE 1: SELECT things, count(*) FROM stuff GROUP BY things COLLATE "...
>
>> Firstly, does it even make sense for a GROUP BY clause to accept COLLATE?
>
> Probably, or at least I'm hesitant to hard-wire a restriction against
> it.  The question is isomorphic to whether you believe that different
> collations can have different equality semantics.

Well the answer to his question is isomorphic to that. But the
question of whether the original query should be isn't. The query only
makes sense to be an error if different collations can have different
output representations -- which I believe is a definite no.

Now the problem gets more complicated because if the above query works
then you should expect to be able to do:

SELECT * FROM morestuff WHERE things IN (SELECT things from stuff
GROUP BY things COLLATE x)

If the GROUP BY could change the meaning of equality for things then
it's hard to figure what meaning should be used for the IN clause. If
it's the default meaning for things and that's different than x then
the IN clause is going to produce a non-deterministic set of results.

Possibly a user would expect the collation on the GROUP BY clause to
dictate the collation on the select list and vice versa. But that's a
pretty far-reaching action-at-a-distance. Or possibly we should just
allow a mismatch but set the collation to "indeterminate" or something
so it can't be used in an outer query without an explicit collation
clause. Still that seems pretty arbitrary.

> regression=# select f1 from int4_tbl group by abs(f1);
> ERROR:  column "int4_tbl.f1" must appear in the GROUP BY clause or be used in an aggregate function
> LINE 1: select f1 from int4_tbl group by abs(f1);

I tihnk that's not a great example because from the user's point of
view it's clear that there could be multiple "f1" values for a single
"abs(f1)" value. In the case of collation there could be multiple
different sort positions in one collation for a single thing thing in
a different collation but people probably think of them as the same
"thing".

It might be more analogous to

select f1 from int4_tbl group by f1::numeric;

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2011-04-20 11:24:25 Re: time-delayed standbys
Previous Message Shigeru Hanada 2011-04-20 10:10:26 Re: Foreign table permissions and cloning