Re: Fixed length data types issue

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixed length data types issue
Date: 2006-09-07 12:41:02
Message-ID: 20060907124102.GL10093@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 07, 2006 at 01:27:01PM +0100, Gregory Stark wrote:
> ... If you look again at the columns in my example you'll
> see none of them are appropriate targets for i18n anyways. They're all codes
> and even numbers.

Which begs the question of why they don't store the numbers in numeric
columns? That'll take far less space than any string.

> In other words if you're actually storing localized text then you almost
> certainly will be using a text or varchar and probably won't even have a
> maximum size. The use case for CHAR(n) is when you have fixed length
> statically defined strings that are always the same length. it doesn't make
> sense to store these in UTF8.

It makes sense to store them as numbers, or perhaps an enum.

> Currently Postgres has a limitation that you can only have one encoding per
> database and one locale per cluster. Personally I'm of the opinion that the
> only correct choice for that is "C" and all localization should be handled in
> the client and with pg_strxfrm. Putting the whole database into non-C locales
> guarantees that the columns that should not be localized will have broken
> semantics and there's no way to work around things in the other direction.

Quite. So if someone would code up SQL COLLATE support and integrate
ICU, everyone would be happy and we could all go home.

BTW, requireing localisation to happen in the client is silly. SQL
provides the ORDER BY clause for strings and it'd be silly to have the
client resort them just because they're not using C locale. The point
of a database was to make your life easier, right?

> Perhaps given the current situation what we should have is a cvarchar and
> cchar data types that are like varchar and char but guaranteed to always be
> interpreted in the c locale with ascii encoding.

I think bytea gives you that, pretty much.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-09-07 12:46:36 Re: UUID/GUID discussion leading to request for hexstring bytea?
Previous Message Magnus Hagander 2006-09-07 12:37:09 Re: Timezone List