Thanks for the reply. Why was the particular change made between 7.2 and
7.3? It seems to have moved away from the standard. I found the
Which generates the mappings. I found it references 3 files from unicode
The JIS0208.TXT has the line...
0x8160 0x2141 0x301C # WAVE DASH
1st col is sjis, 2nd is EUC - 0x8080, 3rd is utf16.
Incidently those mapping files are marked obsolete but I guess the old
mappings still hold.
I guess if I run the perl script it will generate a mapping file
different to what postgresql is currently using. It might be interesting
to pull out the diffs and see what's right/wrong. I guess its not run
I can't see how the change will affect the JDBC driver. It should only
improve the situation. Right now its not possible to go from sjis ->
database (utf8) -> java (jdbc/utf16) -> sjis for the WAVE DASH character
because the mapping is wrong in postgresql. I'll cc the JDBC list and
maybe we'll find out if its a real problem to change the mapping.
Changing the mapping I think is the correct thing to do from what I can
see all around me in different tools like iconv, java 1.4.1, utf-8
terminal and any unicode reference on the web.
What do you think?
On Wed, 2003-02-12 at 22:30, Tatsuo Ishii wrote:
> I think the problem you see is due to the the mapping table changes
> between 7.2 and 7.3. It seems there are more changes other than
> u301c. Moreover according to the recent discussion in Japanese local
> mailing list, 7.3's JDBC driver now relies on the encoding conversion
> performed by the backend. ie. The driver issues "set client_encoding =
> 'UNICODE'". This problem is very complex and I need time to find good
> solution. I don't think simply backout the changes to the mapping
> table solves the problem.
> > Hi all,
> > One Japanese character has been causing my head to swim lately. I've
> > finally tracked down the problem to both Java 1.3 and Postgresql.
> > The problem character is namely:
> > utf-16: 0x301C
> > utf-8: 0xE3809C
> > SJIS: 0x8160
> > EUC_JP: 0xA1C1
> > Otherwise known as the WAVE DASH character.
> > The confusion stems from a very similar character 0xFF5E (utf-16) or
> > 0xEFBD9E (utf-8) the FULLWIDTH TILDE.
> > Java has just lately (1.4.1) finally fixed their mappings so that 0x301C
> > maps correctly to both the correct SJIS and EUC-JP character. Previously
> > (at least in 1.3.1) they mapped SJIS to 0xFF5E and EUC to 0x301C,
> > causing all sorts of trouble.
> > Postgresql at least picked one of the two characters namely 0xFF5E, so
> > conversions in and out of the database to/from sjis/euc seemed to be
> > working. Problem is when you try to view utf-8 from the database or if
> > you read the data into java (utf-16) and try converting to euc or sjis
> > from there.
> > Anyway, I think postgresql needs to be fixed for this character. In my
> > opinion what needs to be done is to change the mappings...
> > euc-jp -> utf-8 -> euc-jp
> > ====== ======== ======
> > 0xA1C1 -> 0xE3809C 0xA1C1
> > sjis -> utf-8 -> sjis
> > ====== ======== ======
> > 0x8160 -> 0xE3809C 0x8160
> > As to what to do with the current mapping of 0xEFBD9E (utf-8)? It
> > probably should be removed. Maybe you could keep the mapping back to the
> > sjis/euc characters to help backward compatibility though. I'm not sure
> > what is the correct approach there.
> > If anyone can tell me how to edit the mappings under:
> > src/backend/utils/mb/Unicode/
> > and rebuild postgres to use them, then I can test this out locally.
> Just edit src/backend/utils/mb/Unicode/*.map and rebiuld
> PostgreSQL. Probably you might want to modify utf8_to_euc_jp.map and
> Tatsuo Ishii
Thomas O'Dowd <tom(at)nooper(dot)com>
Nooper.com Mobile Services Inc
In response to
pgsql-hackers by date
|Next:||From: Merlin Moncure||Date: 2003-02-12 15:39:05|
|Subject: Re: Windows SHMMAX (was: Default configuration)|
|Previous:||From: Merlin Moncure||Date: 2003-02-12 14:49:45|
|Subject: Re: PostgreSQL Windows port strategy |
pgsql-jdbc by date
|Next:||From: Barry Lind||Date: 2003-02-12 17:35:52|
|Subject: Re: Character encoding problem|
|Previous:||From: Boris Klug||Date: 2003-02-12 14:50:11|
|Subject: Character encoding problem |