Re: [HACKERS] Multibyte in autoconf

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: peter_e(at)gmx(dot)net, e99re41(at)DoCS(dot)UU(dot)SE
Cc: t-ishii(at)sra(dot)co(dot)jp, tgl(at)sss(dot)pgh(dot)pa(dot)us, hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Multibyte in autoconf
Date: 1999-12-08 14:31:52
Message-ID: 19991208233152I.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > > If no --pgencoding, you get default (non-multibyte) coding even
> > > if you compiled with --enable-mb.
> >
> > Not agreed. I think it would be better to give an error if no default
> > encoding is not sepecified if configured with --enable-mb. Reasons:
> >
> > 1) Users tend to use only one encoding rather than switching multiple
> > encoding database. Thus major encoding for the user should be properly
> > set as the default.
>
> Users also initdb only once, and that is the time to *choose* what they
> want. Then and only then. Once they're done with that they'll never have
> to worry about it again.
>
> > 2) if non-multibyte coding such as SQL_ASCII is accidently set as the
> > default, and if a multi-byte user create a database with no encoding
> > arugument, the result would be a disaster.
>
> Huh, so if I compile my database with multibyte and then I then I choose
> to not have a default encoding in template1 but maybe I want to have the
> multibyte option available for some other database later on, that will be
> a disaster? Not so good.

First of all, it's not possible not to have a default encoding in
template1. Probably you mean you choose SQL_ASCII (encoding no. is 0)
as the defaut encoding. Anyway, I'm going to give an example scenario
of the disaster.

1) initdb with no encoding augument (suppose that SQL_ASCII is set as
the default encoding in template1)

2) a user creates a database with no encoding augument. he thought
that the default encoding is EUC_JP.

3) he makes a table then fills it with some Japanese data.

4) later he pulls data from the table and found that it no longer
Japanese!

> What I'm also thinking of is the the package maintainer. They should be
> able to provide a "neutral" yet multibyte (and locale, and cyrillic)
> enabled package, and one should be able to use that even if one doesn't
> want to use the multibyte features right now or at all.

So you think a postgres package with multibyte/locale/cyrillic options
enabled is a good thing for everyone? At least I don't like locale
option. It is not only useless for multibyte languages such as
Japanese, but it makes slow for text comparison. I wouldn't say locale
is useless for everyone, however. I admit it is usefull for single
byte encodings.

I think it would be very hard to make a unified ideal package for
everyone.

> Also, it should not be initdb's job to verify that the encodings are
> correct, supported, etc. The backend should find that out itself. That
> eliminates duplication of the same logic, which the backend can do better
> anyway.

Actually that duplication can be eliminated by using the same
code. I think pg_id command will do the job.

BTW, I don't think the current implmentation of multibyte is not yet
completed. Next target would be NATIONAL CHARATER support (not sure
it's for 7.0, though). I would like to find a solution for the
problem of locale I stated above.
--
Tatsuo Ishii

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Don Schindhelm 1999-12-08 15:24:06 Free SQLweb interface to postgresql w/E-Commerce capabilities
Previous Message Brian E Gallew 1999-12-08 14:00:23 Re: [HACKERS] Table aliases in delete statements?