Re: Character sets (Re: Re: Big 7.1 open items)

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: peter_e(at)gmx(dot)net
Cc: lockhart(at)alumni(dot)caltech(dot)edu, t-ishii(at)sra(dot)co(dot)jp, pgsql-hackers(at)hub(dot)org
Subject: Re: Character sets (Re: Re: Big 7.1 open items)
Date: 2000-06-21 06:19:17
Message-ID: 20000621151917D.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> But how are you going to tell a genuine "type" from a character set? And
> you might have to have three types for each charset. There'd be a lot of
> redundancy and confusion regarding the input and output functions and
> other pg_type attributes. No doubt there's something to be learned from
> the type system, but character sets have different properties -- like
> characters(!), collation rules, encoding "translations" and what not.
> There is no doubt also need for different error handling. So I think that
> just dumping every character set into pg_type is not a good idea. That's
> almost equivalent to having separate types for char(6), char(7), etc.
>
> Instead, I'd suggest that character sets become separate objects. A
> character entity would carry around its character set in its header
> somehow. Consider a string concatenation function, being invoked with two
> arguments of the same exotic character set. Using the type system only
> you'd have to either provide a function signature for all combinations of
> characters sets or you'd have to cast them up to SQL_TEXT, concatenate
> them and cast them back to the original charset. A smarter concatentation
> function instead might notice that both arguments are of the same
> character set and simply paste them together right there.

Intersting idea. But what about collations? SQL allows to assign a
collation different from the default one to a character set on the
fly. Should we make collations as separate obejcts as well?

> Here are a couple of "items" I keep wondering about:
>
> * To what extend would we be able to use the operating systems locale
> facilities? Besides the fact that some systems are deficient or broken one
> way or another, POSIX really doesn't provide much besides "given two
> strings, which one is greater", and then only on a per-process basis.
> We'd really need more that, see also LIKE indexing issues, and indexing in
> general.

Correct. I'd suggest completely getting ride of OS's locale.

> * Client support: A lot of language environments provide pretty smooth
> Unicode support these days, e.g., Java, Perl 5.6, and I think that C99 has
> also made some strides. So while "we can store stuff in any character set
> you want" is great, it's really no good if it doesn't work transparently
> with the client interfaces. At least something to keep in mind.

Do you suggest that we should convert everyting into Unicode and store
them into DB?
--
Tatsuo Ishii

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2000-06-21 06:20:00 SQL_TEXT (Re: Re: Big 7.1 open items)
Previous Message Chris Bitmead 2000-06-21 06:13:47 Re: Big 7.1 open items