Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Robert Świętochowski <robert(dot)swietochowski(at)akpa(dot)pl>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"
Date: 2009-07-13 17:58:50
Message-ID: 1247507930.17862.111.camel@ayaki
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

(Please reply to the list, not just to me)

I'm not sure about this so far. Re the specific issue you mention of
conversion between cp1250 and latin-2 (ISO-8859-2) the Unicode tables
at:

http://unicode.org/Public/MAPPINGS/ISO8859/8859-2.TXT

appear to agree - there's no PER MILLE in ISO-8859-2.

With a UTF-8 database, Pg correctly doesn't accept PER MILLE as a valid
ISO-8859-2 char:

-- Connecting with unicode (utf-8) client
CREATE TABLE test (x);
INSERT INTO test(x) VALUES ('‰');

SET client_encoding='iso-8859-2';
SELECT * from test;
ERROR: character 0xe280b0 of encoding "UTF8" has no equivalent in
"LATIN2"

If the encoding is set to WIN1250 Pg outputs the appropriate byte. So
it's doing the right thing in each individual case where a UTF-8 DB is
concerned.

Your problem, though, is that if you connect to a LATIN2 database with a
WIN1250 client and INSERT a string containing the per-mille glyph, Pg
accepts it and it should not. If it does, indeed, accept it, then I
agree that's a bug.

I haven't tested with a LATIN2 database as I'd have to re-initdb and the
machine I'm working on has semi-useful databases on it. What you're
saying makes sense, though, presuming your client really is sending
win1250 per-mille (byte 0x89).

I'd still like to know how you're setting your client encoding. You
can't just run "SET client_encoding='win1250'" - you must tell the
client program, or the terminal it runs in, to use the appropriate
encoding as well. Otherwise when you paste the per-mille character
you'll see the right glyph, but the CLIENT will interpret that as the
character in the encoding you specified.

So, if you're using a utf-8 terminal, that means that the terminal will
send 0xe2 0x80 0xb0 for per-mille, which when interpreted as win1250
becomes ‰ , so that's what the server thinks you sent it.

In that case, though, you'd find that the euro symbol, which isn't
defined in latin-2, will cause an error:

ERROR: character 0xe282ac of encoding "UTF8" has no equivalent in
"LATIN2"

--
Craig Ringer

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2009-07-13 18:30:14 Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"
Previous Message Alvaro Herrera 2009-07-13 17:42:19 Re: BUG #4914: uuid_generate_v4 not present in eithersource or yum/rpm