Skip site navigation (1) Skip section navigation (2)

Proof of concept COLLATE support with patch

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Proof of concept COLLATE support with patch
Date: 2005-09-02 13:04:21
Message-ID: 20050902130420.GA15466@svana.org (view raw or flat)
Thread:
Lists: pgsql-hackers
Supports any glibc platform and possibly Win32.

Adds:
  SELECT ... ORDER BY expr COLLATE 'locale'
  CREATE INDEX locale_index ON table(expr COLLATE 'locale')
  Index scan used when COLLATE order permits

This is just a proof of concept patch. I didn't send it to -patches
because as Tom pointed out, there's no hope of it getting in due to
platform dependant behaviour.

This patch does not use setlocale and is completely orthoganal to any
locale support already in the backend.

As it turns out, meaningful locale support only needs a handful of
support functions to work. These are listed at the bottom. My patch
only uses the first two, but the third will be needed at some stage.
The use of the last one depends on how the backend ends up support
locales. Both glibc and wine32 have locale sensetive versions of many
functions including:

toupper_l, tolower_l, strfmon_l, strtoul_l, strtof_l, strftime_l, is*_l

A windows function list is at:
http://msdn2.microsoft.com/library/wyzd2bce(en-us,vs.80).aspx

Patch available here:
http://svana.org/kleptog/pgsql/collate1.patch

Implementation notes follow and table of functions is at the bottom.

I hope this helps whenever someone gets around to full COLLATE support.

Have a nice day,

Notes:
   * It works by replacing (expr COLLATE 'locale') with
           pg_strxfrm(expr, pg_findlocale(locale))
     in the parsetree.

     pg_findlocale returns an opaque pointer to the locale. It is
     STRICT IMMUTABLE and is optimised away in the final query.

     pg_strxfrm takes the string and the locale and returns a bytea. 
     bytea comparison uses memcmp so is safe from other locale effects
     in the backend.

   * Use of COLLATE for an index will probably double the diskspace
     required for that index due to the strxfrm.

   * I had to add the functions to pg_proc.h because CREATE FUNCTION
     couldn't find them. So they have OIDs I made up. You may need to
     initdb, I'm not sure.

     You can compile pg_xlocale.c as an shared object and load them
     that way too if you want to avoid the initdb.

   * Internally they are defined as taking and returning "internal".
     CREATE FUNCTION doesn't like that so specify opaque or oid
     instead. The declarations are:

  create function pg_findlocale(text) returns oid as 'pg_findlocale' language internal strict immutable;
  create function pg_strxfrm(text,oid) returns bytea as 'pg_strxfrm' language internal strict immutable;

   * The clause ORDER BY 1 COLLATE 'en_AU' breaks, it treats the 1 like
     a constant. I couldn't quickly work out how to reference the
     columns the right way. Long term that code should be in the
     sorting code anyway.

   * The locale needs to be in quotes, otherwise the parser converts it
     to lower-case. Locale names are case-sensetive on many systems.

   * There is a text function strcoll_l for testing collation:

  create function pg_strcoll_l(text,text,text) returns int4 as 'pg_strcoll_l' language internal strict immutable;

   * Yes this is the easy way out, implementing the inheritence of the
     COLLATE attribute will be much more invasive. This gives most
     people what they want though.

   * Although these functions are documented on Windows, they are not
     for glibc, so it is an unstable insterface.

Function Needed                 glibc                 Win32
---------------------------------------------------------------------
Function returing opaque        newlocale             _create_locale
pointer to locale data

strxfrm with locale parameter   strxfrm_l             _strxfrm_l

Method finding encoding for     nl_langinfo_l         ???
locale

strcoll with locale parameter   strcoll_l             _strcoll_l

-- 
Martijn van Oosterhout   <kleptog(at)svana(dot)org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Responses

pgsql-hackers by date

Next:From: Merlin MoncureDate: 2005-09-02 14:02:36
Subject: Re: Call for 7.5 feature completion
Previous:From: Koichi SuzukiDate: 2005-09-02 10:36:42
Subject: Re: A couple of patches for PostgreSQL 64bit support

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group