Database object names and libpq in UTF-8 locale on Windows

From: Sebastien FLAESCH <sf(at)4js(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Database object names and libpq in UTF-8 locale on Windows
Date: 2012-10-11 08:56:52
Message-ID: 507689D4.3050804@4js.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hello,

Using PostgreSQL 9.2.1 on Windows, I am facing a strange character set issue
with a UTF-8 database.

Maybe this is expected, but want to be sure that I am not missing something.

On Windows, I have created a database with:

ENCODING = 'UTF-8'
LC_COLLATE = 'English_United States.1252'
LC_CTYPE = 'English_United States.1252'

I am using libpq with a simple C program.
The program handles UTF-8 strings.

While it's not a traditional "UNICODE" Windows application using WCHAR, it
should be possible to send and get back UTF-8 data with PostgreSQL.

Interacting with the system in the Windows locale is another story, but from
a pure C / SQL / libpq point of view, as long as the PostgreSQL client encoding
is properly defined to UTF-8, it should work, and it does mainly:

- I can use UTF-8 string constants in my queries.
- I can pass UTF-8 data to the database with parameterized queries.
- I can fetch UTF-8 data from the database.
- I can create db object names with UTF-8 characters.

But the db object names must be specified with double quotes:
When I do not use quoted db object names, I get a strange problem.
The table is created, I can use it in my program, I can even use it in the
pgAdmin query tool, but in the pgAdmin db browser, there is no table name
displayed in the treeview...

Further, when selecting schema information from pg_class, I can see that the
table exists, but there is nothing displayed in the relname attribute...

It appears that the problem disappears when using a "C" collation are char type:

ENCODING = 'UTF-8'
LC_COLLATE = 'C'
LC_CTYPE = 'C'

I suspect this has something to do with the fact that non-quoted identifiers
are converted to lowercase, and because my LC_CTYPE is English_United States.1252,
the conversion to lowercase fails...

But:

- why does PostgreSQL accept to create the table with invalid UTF-8 characters?

- why can I make queries in the query tool with the UTF-8 table name?
(note that show client_encoding in the query tool gives me "UNICODE" - is this
the same as "UTF-8"?)

So is there a bug, or do I have to use a C collation and char type for UTF-8
databases on Windows?

I know that UTF-8 is not supported by the Windows C/POSIX library (setlocale,
and co).

Does it mean that it's not realistic to use UTF-8 encoding for PostgreSQL
databases on Windows...?

Thanks for reading.
Sebastien FLAESCH
Four Js Development Tools

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2012-10-11 14:03:48 Re: Database object names and libpq in UTF-8 locale on Windows
Previous Message Kim Bisgaard 2012-10-08 15:38:46 Error 42704