Re: Fixing row comparison semantics

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixing row comparison semantics
Date: 2005-12-25 13:10:07
Message-ID: 20051225131005.GA23081@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 24, 2005 at 09:38:23AM -0500, Tom Lane wrote:
> Are you suggesting that COLLATE will impose comparison semantics on
> all datatypes including non-string types? If so, I'd be interested
> to know what you have in mind. If not, claiming that it makes the
> issue go away is nonsensical.

Well, yes, on all data types. It needs to be done for string types and
it would be nice for user-defined data types, so you may as well do it
for all types. It avoids adding special cases, which is a good thing,
IMHO.

Every data type has at least two collations, ascending and descending.
So instead of all the current stuff with reverse operator classes,
you'll just be able to declare your index as:

CREATE INDEX blah ON foo (a, b COLLATE DESC);

And it'll be able to be used for queries using ORDER BY a, b DESC.

String data types are just the obvious example of types that have many
different collations and they do have the most possibilities. But I
think that user-defined collations would be a powerful idea. All they
need to do is create a btree operator class that describes the basic
order and then they can use this as a collation anywhere they like.

I hope you are not thinking of restricting collations to just string
types, because the special cases would be dreadful. Doing it this way
just means that most places dealing with order only need to worry about
the collation eg pathkeys and not the implementation.

Technically speaking, NULLS FIRST/LAST are also a form of collation but
I'm not going to touch those until I can at least replicate current
functionality, and they are not relevent for row comparisons anyway.
Collations to operator classes are a many-to-one relationship. I can
see situations where you would have 20 collations using a single
operator class.

Locale specific ordering is really just a subset of collation. At the
moment it just uses the xlocale support present in glibc/MacOS X/Win32
but my hope is that it will be pluggable in the sense that you'll be
able to say:

CREATE LOCALE hungarian AS 'hu_HU' USING glibc;
CREATE LOCALE serbian_us AS 'sr_Latn_YU_REVISED(at)currency=USD' USING icu;

(The latter being: Serbian (Latin, Yugoslavia, Revised Orthography,
Currency=US Dollar. Example taken from ICU website).

Then you can use these in column declarations and have them
automatically use that locale for comparisons. It isn't as hard as it
looks but it does touch a lot of different places in the backend.

If you want technical details I can do that too (the summary on
pg-patches a while ago is now wildly out of date). Currently I'm trying
to get up to speed on pathkeys and indexes before the tree drifts too
far...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira de Oliveira 2005-12-25 20:56:03 Re: [HACKERS] to_char and i18n
Previous Message Gregor Zeitlinger 2005-12-25 13:02:13 Incremental Backup Script