Quick Links

Re: Continuing encoding fun....

From:	"Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
To:	"Marc Herbert" <Marc(dot)Herbert(at)continuent(dot)com>, <pgsql-odbc(at)postgresql(dot)org>
Subject:	Re: Continuing encoding fun....
Date:	2005-11-23 08:59:59
Message-ID:	E7F85A1B5FF8D44C8A1AF6885BC9A0E4E7E293@ratbert.vale-housing.co.uk
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-odbc

> -----Original Message-----
> From: pgsql-odbc-owner(at)postgresql(dot)org
> [mailto:pgsql-odbc-owner(at)postgresql(dot)org] On Behalf Of Marc Herbert
> Sent: 22 November 2005 09:33
> To: pgsql-odbc(at)postgresql(dot)org
> Subject: Re: [ODBC] Continuing encoding fun....
>
> "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk> writes:
>
> >> I agree that 4) can never work, because ODBC does not seem
> compatible
> >> with multibyte apps by design. ODBC caters for "ANSI" and "Unicode"
> >> strings, that's all.
> >> <http://blogs.msdn.com/oldnewthing/archive/2004/05/31/144893.aspx>
> >
>
> > Actually our ANSI driver works quite nicely in various non-Unicode
> > multibyte encodings such as Shift-JIS, EUC_CN, JOHAB and more. It'll
> > even work with pure UTF-8 in multibyte mode using the ANSI API.
>
> Great.
>
> Out of curiosity, is this because all the ODBC code has a "don't
> touch" attitude in this full-ANSI case, leaving all string data as is?
> Or is there something more clever? Who performs the conversion if the
> database is in UTF-8 for instance? Multibyte cases seem to
> fall outside
> the scope of the ODBC spec, which refers only to "ANSI" and "Unicode".

No, Multibyte support was intentionally added by Eiji Tokuya in 2001. Don't ask me how it works though as I really don't know. Much of the code for it is in multibyte.c if you want to take a peek.

> Very interesting. Maybe the driver manager does so only because the it
> cannot/fails to get the active codepage, falling back on CP-1252?
> (CP1252 ~= latin1,
> <http://czyborra.com/charsets/codepages.html#CP1252>)

The docs are somewhat fuzzy on this point, simply stating that

"If the driver is a Unicode driver, the Driver Manager makes function calls as follows:" ... "Converts an ANSI function (with the A suffix) to a Unicode function (with the W suffix) by converting the string arguments into Unicode characters and passes the Unicode function to the driver."

(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/odbcunicode_applications.asp)

My assertion that the driver does the conversion comes from the SQL Server driver which allows you to turn conversion on or off:

"Perform translation for character data check box

When selected, the SQL Server ODBC driver converts ANSI strings sent between the client computer and SQL Server by using Unicode. The SQL Server ODBC driver sometimes converts between the SQL Server code page and Unicode on the client computer. This requires that the code page used by SQL Server be one of the code pages available on the client computer.

When cleared, no translation of extended characters in ANSI character strings is done when they are sent between the client application and the server. If the client computer is using an ANSI code page (ACP) different from the SQL Server code page, extended characters in ANSI character strings may be misinterpreted. If the client computer is using the same code page for its ACP that SQL Server is using, the extended characters are interpreted correctly."

If Microsoft intended the DM to do the conversion when they wrote the spec, why would they then add the same functionality to their driver?

> >> Is this "bug" true for every driver manager out there?
>
> > It's not really a bug, but I believe so, yes.
>
> including unixodbc and iodbc for instance?

If they follow the parts of the spec I quoted above, and interpret them in the same when, then yes. However I'm not overly familiar with either DM, so I can't say for sure.

> > It gets corrected by
> > the more advanced drivers though - for example, the SQL server
> > driver might see a 'Š' character (8A). It knows the local charset is
> > LATIN4, so it can then rewrite that character to 0160, the Unicode
> > equivalent.
>
> Are you saying that the SQL server driver is fixing the flawed
> conversion job of the driver manager, finally taking the codepage into
> account? Surprising to say the least!
>
> By the way 0x8A is not in the range of latin4
> <http://czyborra.com/charsets/iso8859.html#ISO-8859-4>

http://www.gar.no/home/mats/8859-4.htm says differently, however, I can't claim to know enough about encoding issues to refute either. I've been forced to learn what I can about the subject to help maintain this driver and certainly may have got the wrong end of the stick on one or more points!

Regards, Dave.

Responses

Microsoft harmful extensions to 8859-X charsets (was: Continuing encoding fun....) at 2005-11-24 14:10:23 from Marc Herbert
Re: Continuing encoding fun.... at 2005-11-24 14:18:23 from Marc Herbert

Browse pgsql-odbc by date

	From	Date	Subject
Next Message	Miguel Juan	2005-11-23 16:06:44	Re: psqlOdbc Ansi + BDE issues
Previous Message	=?iso-8859-1?q?Tomas_Sk=E4re?=	2005-11-23 07:54:11	Re: asynchronous execution