Re: Invalid EUC_JP char seq bug?

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: jc(at)mega-bucks(dot)co(dot)jp
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Invalid EUC_JP char seq bug?
Date: 2003-07-02 02:20:17
Message-ID: 20030702.112017.122620186.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> I am using PHP with postgreSQL and I have been getting a few rare errors
> while trying to do selects on a table containing EUC_JP text.
>
> I thought it was a bug with PHP not recognizing a string as invalid
> EUC_JP characters and wrote up a bug report but the PHP developers
> assure me that the string that is generating the error is a valid EUC_JP
> string (I don't know anything about character encodings so I am taking
> them at their word and the fact that the string displays fine in my
> browser as EUC_JP lends me to suspect they might be right).
>
> The offending string is url encoded as such:
>
> words=%8f%ac%90%ec%96%be%93%fa%8d%81
>
> When I try and do a SELECT I get the following error:
>
> select id from products where name like '??????'
> Query failed: ERROR: Invalid EUC_JP character sequence found (0x8100)

Since you did not show us exact query you send to PostgreSQL, I assume
the query passed to PostgreSQL is:

select id from products where name like 'string';

where string is "0x8fac90ec96be93fa8d81".

If the string is supposed to be an EUC_JP, it would be parsed as follows:

8f: single shift 3 (indicates that following 2 bytes are a JIS 0212 character)
ac90: a JIS 0212 character
ec96: a JIS 0208 character
be93: a JIS 0208 character
fa8d: a JIS 0208 character
81: ???

The last 0x81 is invalid if the string is assumed as EUC_JP.

> (Where did the 0x00 come from??)

trailing '\0'.

> Can someone let me know if this truly is a bug in postgres?

No.

> Thanks,
>
> Jean-Christian Imbeault
>
>
> PS I have also had the error pop up with this string:
>
> search_words=%B7%F6%BA%7E
> select id from products where name like '??~'
> Query failed: ERROR: Invalid EUC_JP character sequence found (0xba7e)

This is definitly a bad EUC_JP.
--
Tatsuo Ishii

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jean-Christian Imbeault 2003-07-02 02:42:30 Re: Invalid EUC_JP char seq bug?
Previous Message Jean-Christian Imbeault 2003-07-02 00:34:15 Invalid EUC_JP char seq bug?