Re: Per-column collation, work in progress

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation, work in progress
Date: 2010-09-26 13:37:02
Message-ID: AANLkTimb3+_E7=3o9u6_GEV7V3w=FmRMroSgw60SNWrJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 26, 2010 at 1:15 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> Is there any reason why you prohibit a different encodings per one
> database? Actually people expect from collate per column a possibility
> to store a two or more different encodings per one database.

These are two completely separate problems that only look related. The
main difference is that while collation is a property of the
comparison or sort you're performing encoding is actually a property
of the string itself. It doesn't make sense to specify a different
encoding than what the string actually contains.

You could actually do what you want now by using bytea columns and
convert_to/convert_from and it wouldn't be much easier if the support
were built into text since you would still have to keep track of the
encoding it's in and the encoding you want. We could have a
encoded_text data type which includes both the encoding and the string
and which any comparison function automatically handles conversion
based on the encoding of the collation requested -- but I wouldn't
want that to be the default text datatype. It would impose a lot of
overhead on the basic text operations and magnify the effects of
choosing the wrong collation.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-09-26 13:39:19 Re: Stalled post to pgsql-committers
Previous Message Pavel Stehule 2010-09-26 12:15:25 Re: Per-column collation, work in progress