Re: How to pass around collation information

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: How to pass around collation information
Date: 2010-05-28 19:53:39
Message-ID: m2ljb3ls58.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> On fre, 2010-05-28 at 20:22 +0300, Heikki Linnakangas wrote:
>> USING <operator> syntax). The behavior is exactly what we want, it's
>> just completely inpractical, so we need something to do the same in a
>> less cumbersome way.

For an example, here is something I did to better understand the system
a while ago. Of course I never got to use it for real:

http://pgsql.tapoueh.org/btree_fr_ops/

> Well, maybe we should step back a little and work out what sort of
> feature we actually want, if any. The feature I'm thinking of is what
> people might call "per-column locale", and the SQL standard defines
> that. It would look like this:
>
> CREATE TABLE test (
> a varchar COLLATE de,
> b varchar COLLATE fr
> );
>
> SELECT * FROM test WHERE a > 'baz' ORDER BY b;
>
> So while it's true that the collation is used by the operations (> and
> ORDER BY), the information which collation to use comes with the data
> values. It's basically saying, a is in language "de", so sort it like
> that unless told otherwise. There is also an override syntax available,
> like this:
>
> SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv;
>
> But here again the collation is attached to a data value, and only from
> there it is passed to the operator. What is actually happening is
>
> SELECT * FROM test WHERE (a COLLATE en) > 'baz' ORDER BY (b COLLATE sv);
>
>
> What you appear to be describing is a "per-operation locale", which also
> sounds valid, but it would be a different thing. It might be thought of
> as this:
>
> SELECT * FROM test WHERE a (> COLLATE en) 'baz' ORDER BY COLLATE sv b;
>
> with some suitable global default.
>
>
> So which one of these should it be?

My understanding is that what we do is per-operation locale. The locale
information bears no semantic when not attached to some operation on
strings, like sorting or comparing.

So what you're showing here I think is how to attach a collation label
to every string in the system, at the catalog level or dynamically at
the query level.

Now this collation label will only be used whenever you want to use a
collation aware function or operator. Those functions need to get the
labels for their implementation to have the expected meaning.

So we need both to attach collations to all known strings (defaulting to
the current database collation I guess), as you showed at the SQL level,
and to pass this information down to the functions operating on those
strings.

A confusing example on this grounds would be the following, which I hope
the standard disallow:

SELECT * FROM test WHERE a COLLATE en > b COLLATE sv;

Regards,
--
dim

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2010-05-28 20:04:07 Re: tsvector pg_stats seems quite a bit off.
Previous Message Dimitri Fontaine 2010-05-28 19:38:56 Re: functional call named notation clashes with SQL feature