Skip site navigation (1) Skip section navigation (2)

Re: Proposal - Collation at database level

From: Radek Strnad <radek(dot)strnad(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal - Collation at database level
Date: 2008-05-30 13:06:38
Message-ID: 483FFBDE.9040504@gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Zdenek Kotala wrote:
> Radek Strnad napsal(a):
>
> <snip>
>
>>
>> I'm thinking of dividing the problem into two parts - in beginning
>> pg_collation will contain two functions. One will have hard-coded rules
>> for these basic collations (SQL_CHARACTER, GRAPHIC_IRV, LATIN1, ISO8BIT,
>> UCS_BASIC). It will compare each string character bitwise and guarantee
>> that the implementation will meet the SQL standard implemented in
>> PostgreSQL.
>> Second one will allow the user to use installed system locales. The set
>> of these collations will obviously vary between systems. Catalogs will
>> contain encoding and collation for calling the system locale function.
>> This will allow us to use collations such as en_US.utf8, cs_CZ.iso88592
>> etc. if they will be availible.
>>
>> We will also need to change the way how strings are compared. Regarding
>> the set database collation the right function will be used.
>> http://doxygen.postgresql.org/varlena_8c.html#4c7af81f110f9be0bd8eb2bd99525675 
>>
>>
>> This design will make possible switch to ICU or any other implementation
>> quite simple and will not cause any major rewriting of what I'm coding
>> right now.
>
>
> Collation function is main point here. How you mentioned one will be 
> only wrapper about strcmp and second one about strcoll. (maybe you 
> need four - char/wchar) Which function will be used it is defined in 
> pg_collation catalog by CREATE COLLATION command. But you need specify 
> name of locale for system locales. It means you need attribute for 
> storing locale name.
>
You're right. I've extended pg_collation for system locale columns. In 
the first stage we actually don't need any other catalogs such as 
encoding, etc. and we can build this functionality only on following 
pg_collation catalog. Used collation function (system or built-in) will 
be decided  on existing collation name.

CATALOG(pg_collations, ###)
{
    NameData    colname;        /* collation name */
    Oid        colschema;        /* collation schema */
    NameData   colcharset;    /*  character set specification */
    Oid         colexistingcollation; /* existing collation */
    bool        colpadattribute;    /* pad attribute */
    bool        colcasesensitive;    /* case sensitive */
    bool        colaccent;        /* accent sensitive */
    NameData    colsyslccollate;    /* lc_collate */
    NameData    colsyslcctype; /* lc_ctype */
    regproc        colfunc;        /* used collation function */
} FormData_pg_collations;


>> FormData_pg_collations;
> It would be good to send list of new and modified SQL commands (like 
> CREATE COLLATION) for wide discussion.
>
CREATE COLLATION <collation name> FOR <character set specification> FROM 
<existing collation name> [ <pad characteristic> ] [ <case sensitive> ] 
[ <accent sensitive> ] [ LC_COLLATE <lc_collate> ] [ LC_CTYPE <lc_ctype> ]

<pad characteristic> := NO PAD | PAD SPACE
<case sensitive> := CASE SENSITIVE | CASE INSENSITIVE
<accent sensitive> := ACCENT SENSITIVE | ACCENT INSENSITIVE

Since you can specify order by in select clause there's no need for 
adding ascending and descending type  of collation. They will allways be 
ascending.

DROP COLLATION <collation name>

CREATE DATABASE ... [ COLLATE <collation name> ] ...

ALTER DATABASE ... [ COLLATE <collation name> ] ...

    Any thoughts?

          Radek



In response to

Responses

pgsql-hackers by date

Next:From: marcoduarteDate: 2008-05-30 13:26:35
Subject: Sugestion: xpath
Previous:From: Pavan DeolaseeDate: 2008-05-30 11:19:03
Subject: Re: Avoiding second heap scan in VACUUM

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group