Re: Invalid EUC_JP char seq bug?

From: Jean-Christian Imbeault <jc(at)mega-bucks(dot)co(dot)jp>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Invalid EUC_JP char seq bug?
Date: 2003-07-02 02:42:30
Message-ID: 3F024696.3080305@mega-bucks.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tatsuo Ishii wrote:
>
> Since you did not show us exact query you send to PostgreSQL

I can't show the exact query because it is generated by PHP. I can
however show you the code that generates the query:

$words = $_GET["words"];
$sql = "select id from products where name like '$words'";
$conn = pg_connect("host=$DB_IP port=5432 dbname=$DB_NAME user=postgres");
$res = pg_query($conn, $sql);

The GET query string was:

words=%8f%ac%90%ec%96%be%93%fa%8d%81

I think that PHP does some internal translation of this before passing
it on though.

> I assume the query passed to PostgreSQL is:
>
> select id from products where name like 'string';

Yes.

> where string is "0x8fac90ec96be93fa8d81".

That I don't know.

> If the string is supposed to be an EUC_JP, it would be parsed as follows:
>
> 8f: single shift 3 (indicates that following 2 bytes are a JIS 0212 character

[snip ...]

Ah ... so it is not an EUC-JP string but an SJIS string. Postgres was
right. That answers my question. Thanks!

>>PS I have also had the error pop up with this string:
>>
>>search_words=%B7%F6%BA%7E
>>select id from products where name like '??~'
>>Query failed: ERROR: Invalid EUC_JP character sequence found (0xba7e)
>
>
> This is definitly a bad EUC_JP.

According to a PHP developer in my bug report
(http://bugs.php.net/bug.php?id=24309&edit=2):

"URL decoded byte sequance of 'search_words=%B7%F6%BA%7E' is
B7E6+BA7E, which is correct EUC-JP character sequence. [snip] But, I
believe encoding detection of mbstring works fine in this case.
B7E6+BA7E is not correct byte sequence of SJIS, UTF-8, ISO2022-JP. It is
correct EUC-JP byte sequence."

I see that he wrote B7E6 instead of the correct B7F6. I resubmitted my
bug report to PHP and pointed this out. Hopefully the developer will see
that this sequence is incorrect EUC-JP and that PHP failed to detect this :)

I *knew* there was nothing wrong with Postgres ;)

Thanks!

Jean-Christian Imbeault

PS I posted to HACKERS a few weeks ago about another bug (a real one :)
in the EUC-JP translation having to do with the WAVE DASH. I'll repost
here on the BUGS list, could you let me know the status of that BUG? Thanks!

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jean-Christian Imbeault 2003-07-02 02:45:56 Bug in japanese charset mappings?
Previous Message Tatsuo Ishii 2003-07-02 02:20:17 Re: Invalid EUC_JP char seq bug?