Re: Unicode support

From: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
To: "Marko Ristola" <Marko(dot)Ristola(at)kolumbus(dot)fi>
Cc: "Hiroshi Saito" <saito(at)inetrt(dot)skcapi(dot)co(dot)jp>, "Anoop Kumar" <anoopk(at)pervasive-postgres(dot)com>, <pgsql-odbc(at)postgresql(dot)org>
Subject: Re: Unicode support
Date: 2005-09-02 07:50:12
Message-ID: E7F85A1B5FF8D44C8A1AF6885BC9A0E4AC9E13@ratbert.vale-housing.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-odbc

> -----Original Message-----
> From: pgsql-odbc-owner(at)postgresql(dot)org
> [mailto:pgsql-odbc-owner(at)postgresql(dot)org] On Behalf Of Marko Ristola
> Sent: 01 September 2005 18:21
> Cc: Hiroshi Saito; Anoop Kumar; pgsql-odbc(at)postgresql(dot)org
> Subject: Re: [ODBC] Unicode support
>
>
> Hi all.

Hi Marko,

> How about creating a charset conversion interface
> and taking UTF-8 as an internal format for ODBC?:
>

<snip>

>
> So, there would be a single internal UTF-8 format inside PsqlODBC.
> The backend could always deliver UTF-8, so the need for internal
> format <-> backend format layer is not needed.
>
> This implementation would be easy to implement.

This is what already happens (if you ignore my recent experimental
patch).

If the connection is made using one of the *W connect functions, then
the ConnectionClass->unicode flag is set to true, and SET
client_encoding = 'UTF-8' is sent to the backend. From then on, data
going out to the client is fed through utf8_to_ucs2_lf() *if * the data
type is specified as SQL_C_WCHAR, and data coming in to *W functions is
fed through ucs2_to_utf8().

Afaict, Unicode mode works exactly as it should.

If the connection is made using a non-wide function, the
ConnectionClass->unicode is not set. In this case, the client is
expected to continue using non-wide functions, and the client encoding
left at default. In this case, the driver will never report data types
as SQL_C_WCHAR.

This, is where I believe the major problem occurs - if the ODBC Driver
Manager sees that SQLConnectW (iirc) exists, it will automatically map
ANSI calls (eg. SQLConnect()) to Unicode (eg. SQLConnectW()). This then
causes the driver to report text/char columns as SQL_C_WCHAR. Less well
written apps then fall over because they aren't clever enough to request
data as SQL_C_CHAR instead of SQL_C_WCHAR.

My recent experimental patch aims to address this, by forcing the driver
to report SQL_C_CHAR instead of SQL_C_WCHAR for non-unicode databases.
This should (and seems to, with minor side effects yet to be fully
investigated) fix the BDE problem.

As for multibyte (non-unicode) data such as Hiroshi's, my understanding
is that in the presence of a Unicode driver, apps are expected to use
Unicode (and in fact, are forced to by the driver manager's mapping of
ANSI function calls to Unicode calls).

Anoop, do you or any of your guys (or anyone else) know
unicode/multibyte/encoding well? I'm learning as I go at the moment, so
some more experienced help would be *really* appreciated.

Regards, Dave.

Browse pgsql-odbc by date

  From Date Subject
Next Message Dave Page 2005-09-02 08:41:29 Re: savepoint
Previous Message Austin Foxley 2005-09-01 23:16:41 Distributed Transaction