From: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
---|---|
To: | peter_e(at)gmx(dot)net |
Cc: | lockhart(at)alumni(dot)caltech(dot)edu, t-ishii(at)sra(dot)co(dot)jp, pgsql-hackers(at)hub(dot)org |
Subject: | Re: Character sets (Re: Re: Big 7.1 open items) |
Date: | 2000-06-21 06:19:17 |
Message-ID: | 20000621151917D.t-ishii@sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> But how are you going to tell a genuine "type" from a character set? And
> you might have to have three types for each charset. There'd be a lot of
> redundancy and confusion regarding the input and output functions and
> other pg_type attributes. No doubt there's something to be learned from
> the type system, but character sets have different properties -- like
> characters(!), collation rules, encoding "translations" and what not.
> There is no doubt also need for different error handling. So I think that
> just dumping every character set into pg_type is not a good idea. That's
> almost equivalent to having separate types for char(6), char(7), etc.
>
> Instead, I'd suggest that character sets become separate objects. A
> character entity would carry around its character set in its header
> somehow. Consider a string concatenation function, being invoked with two
> arguments of the same exotic character set. Using the type system only
> you'd have to either provide a function signature for all combinations of
> characters sets or you'd have to cast them up to SQL_TEXT, concatenate
> them and cast them back to the original charset. A smarter concatentation
> function instead might notice that both arguments are of the same
> character set and simply paste them together right there.
Intersting idea. But what about collations? SQL allows to assign a
collation different from the default one to a character set on the
fly. Should we make collations as separate obejcts as well?
> Here are a couple of "items" I keep wondering about:
>
> * To what extend would we be able to use the operating systems locale
> facilities? Besides the fact that some systems are deficient or broken one
> way or another, POSIX really doesn't provide much besides "given two
> strings, which one is greater", and then only on a per-process basis.
> We'd really need more that, see also LIKE indexing issues, and indexing in
> general.
Correct. I'd suggest completely getting ride of OS's locale.
> * Client support: A lot of language environments provide pretty smooth
> Unicode support these days, e.g., Java, Perl 5.6, and I think that C99 has
> also made some strides. So while "we can store stuff in any character set
> you want" is great, it's really no good if it doesn't work transparently
> with the client interfaces. At least something to keep in mind.
Do you suggest that we should convert everyting into Unicode and store
them into DB?
--
Tatsuo Ishii
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2000-06-21 06:20:00 | SQL_TEXT (Re: Re: Big 7.1 open items) |
Previous Message | Chris Bitmead | 2000-06-21 06:13:47 | Re: Big 7.1 open items |