Re: Proposal - Collation at database level

From: Radek Strnad <radek(dot)strnad(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal - Collation at database level
Date: 2008-05-30 13:06:38
Message-ID: 483FFBDE.9040504@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Zdenek Kotala wrote:
> Radek Strnad napsal(a):
>
> <snip>
>
>>
>> I'm thinking of dividing the problem into two parts - in beginning
>> pg_collation will contain two functions. One will have hard-coded rules
>> for these basic collations (SQL_CHARACTER, GRAPHIC_IRV, LATIN1, ISO8BIT,
>> UCS_BASIC). It will compare each string character bitwise and guarantee
>> that the implementation will meet the SQL standard implemented in
>> PostgreSQL.
>> Second one will allow the user to use installed system locales. The set
>> of these collations will obviously vary between systems. Catalogs will
>> contain encoding and collation for calling the system locale function.
>> This will allow us to use collations such as en_US.utf8, cs_CZ.iso88592
>> etc. if they will be availible.
>>
>> We will also need to change the way how strings are compared. Regarding
>> the set database collation the right function will be used.
>> http://doxygen.postgresql.org/varlena_8c.html#4c7af81f110f9be0bd8eb2bd99525675
>>
>>
>> This design will make possible switch to ICU or any other implementation
>> quite simple and will not cause any major rewriting of what I'm coding
>> right now.
>
>
> Collation function is main point here. How you mentioned one will be
> only wrapper about strcmp and second one about strcoll. (maybe you
> need four - char/wchar) Which function will be used it is defined in
> pg_collation catalog by CREATE COLLATION command. But you need specify
> name of locale for system locales. It means you need attribute for
> storing locale name.
>
You're right. I've extended pg_collation for system locale columns. In
the first stage we actually don't need any other catalogs such as
encoding, etc. and we can build this functionality only on following
pg_collation catalog. Used collation function (system or built-in) will
be decided on existing collation name.

CATALOG(pg_collations, ###)
{
NameData colname; /* collation name */
Oid colschema; /* collation schema */
NameData colcharset; /* character set specification */
Oid colexistingcollation; /* existing collation */
bool colpadattribute; /* pad attribute */
bool colcasesensitive; /* case sensitive */
bool colaccent; /* accent sensitive */
NameData colsyslccollate; /* lc_collate */
NameData colsyslcctype; /* lc_ctype */
regproc colfunc; /* used collation function */
} FormData_pg_collations;

>> FormData_pg_collations;
> It would be good to send list of new and modified SQL commands (like
> CREATE COLLATION) for wide discussion.
>
CREATE COLLATION <collation name> FOR <character set specification> FROM
<existing collation name> [ <pad characteristic> ] [ <case sensitive> ]
[ <accent sensitive> ] [ LC_COLLATE <lc_collate> ] [ LC_CTYPE <lc_ctype> ]

<pad characteristic> := NO PAD | PAD SPACE
<case sensitive> := CASE SENSITIVE | CASE INSENSITIVE
<accent sensitive> := ACCENT SENSITIVE | ACCENT INSENSITIVE

Since you can specify order by in select clause there's no need for
adding ascending and descending type of collation. They will allways be
ascending.

DROP COLLATION <collation name>

CREATE DATABASE ... [ COLLATE <collation name> ] ...

ALTER DATABASE ... [ COLLATE <collation name> ] ...

Any thoughts?

Radek

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message marcoduarte 2008-05-30 13:26:35 Sugestion: xpath
Previous Message Pavan Deolasee 2008-05-30 11:19:03 Re: Avoiding second heap scan in VACUUM