Re: UTF-8 encoding problem w/ libpq

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
Cc: Martin Schäfer <Martin(dot)Schaefer(at)cadcorp(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF-8 encoding problem w/ libpq
Date: 2013-06-03 16:22:59
Message-ID: 51ACC2E3.9020309@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03.06.2013 18:27, ktm(at)rice(dot)edu wrote:
> On Mon, Jun 03, 2013 at 04:09:29PM +0100, Martin Schäfer wrote:
>>
>>>> If I change the strCreate query and add double quotes around the column
>>> name, then the problem disappears. But the original name is already in
>>> lowercase, so I think it should also work without quoting the column name.
>>>> Am I missing some setup in either the database or in the use of libpq?
>>>>
>>>> I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
>>>>
>>>> The database uses:
>>>> ENCODING = 'UTF8'
>>>> LC_COLLATE = 'English_United Kingdom.1252'
>>>> LC_CTYPE = 'English_United Kingdom.1252'
>>>>
>>>> Thanks for any help,
>>>>
>>>> Martin
>>>>
>>>
>>> Hi Martin,
>>>
>>> If you do not want the lowercase behavior, you must put double-quotes
>>> around the column name per the documentation:
>>>
>>> http://www.postgresql.org/docs/9.2/interactive/sql-syntax-
>>> lexical.html#SQL-SYNTAX-IDENTIFIERS
>>>
>>> section 4.1.1.
>>>
>>> Regards,
>>> Ken
>>
>> The original name 'id_äß' is already in lowercase. The backend should leave it unchanged IMO.
>
> Only in utf-8 which needs to be double-quoted for a column name as you have
> seen, otherwise the value will be lowercased per byte.

He *is* using UTF-8. Or trying to, anyway :-). The downcasing in the
backend is supposed to leave bytes with the high-bit set alone, ie. in
UTF-8 encoding, it's supposed to leave ä and ß alone.

I suspect that the conversion to UTF-8, before the string is sent to the
server, is not being done correctly. I'm not sure what's wrong there,
but I'd suggest printing the actual byte sequence sent to the server, to
check if it's in fact valid UTF-8. ie. replace the PQexec() line with
something like:

const char *s = ToUtf8(strCreate.c_str()).c_str();
int i;
for (i=0; s[i]; i++)
printf("%02x", (unsigned char) s[i]);
printf("\n");
pResult = PQexec(pConn, s);

That should contain the UTF-8 byte sequence for äß, "c3a4c39f"

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2013-06-03 17:06:15 Re: Re: [HACKERS] high io BUT huge amount of free memory
Previous Message David E. Wheeler 2013-06-03 16:16:03 Re: Perl 5.18 breaks pl/perl regression tests?