Re: Unicode support

From: Marko Ristola <Marko(dot)Ristola(at)kolumbus(dot)fi>
To: Hiroshi Saito <saito(at)inetrt(dot)skcapi(dot)co(dot)jp>
Cc: Dave Page <dpage(at)vale-housing(dot)co(dot)uk>, Anoop Kumar <anoopk(at)pervasive-postgres(dot)com>, pgsql-odbc(at)postgresql(dot)org
Subject: Re: Unicode support
Date: 2005-09-03 07:23:04
Message-ID: 43194F58.6050909@kolumbus.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-odbc


So, I don't have much experience with Windows ODBC. That's true.
Is it possible to compile psqlodbc with MinGW tools for Windows?

After using Google, I found out, that GLIB libraries are able to convert
UTF-8 into multibyte under Windows. Windows should be
able to convert UTF-8 into Multibyte and vice versa with it's character
set conversion
functions.

After using Google, I found out, that Windows XP had a problem with
Korean multibyte:

"Windows XP Device Driver Does Not Convert Multibyte Data to Korean"

Article ID: 817522.

That was fixed in Service Pack 2.

So I ask you, how you have thought about these things:

If I have understood Windows correctly, it uses UCS-2 as it's internal
UNICODE
character set. Linux prefers into UTF-8. So, If we classify UCS-2 and
UTF-8 equal inside psqoldbc,
that makes sense. That's what has been implemented into psqlodbc already
for Windows.

Then there is the world before Unicode existed. There were DOS codepages,
character sets for groups of countries and Multibyte character sets.

JIS X 0208 is a character set (see man 7 charsets).
Shift_JIS is an encoding that can contain JIS X 0208 multibyte
characters (see man 7 charsets).

So it seems, that one working implementation can be done by using UTF-8
PostgreSQL server
and UTF-8 to multibyte conversions.

However, according to Samba team's UNICODE problem descriptions,
there are some problems: UTF-8 to EUC_JP conversion may be different
on Linux and Windows, and on different conversion library implementations.

Some multibyte character sets are contraditory with each other.

If we drop the *W() functions away, we might get a working implementation,
but we might not support the full ODBC API?

So if and only if one single conversion library does the conversions, it
works.

So if and only if the PostgreSQL backend, or only the PSQLODBC side
does the needed conversions, psqlodbc should work with multibyte
encodings, with UTF-8. If the PostgreSQL Server is in a same kind of
Windows environment than the clients, it should work
fully with UTF-8 and the multibyte character sets. This should be the
best working option.

Windows does have a working UCS-2 to multibyte conversion implementation
on the psqlodbc client (since Service Pack 2).

Unfortunately pg_dump + restore from SJIS into UTF-8 might not work,
because Linux's ICONV might not do the conversion correctly.

The conversion into UTF-8 must be done using fully working Windows
conversion functions.
So one way might be something like using such pg_dump under Windows,
that does the multibyte into UTF-8 conversion in Windows side.

How about the following implementation:
ODBC against the backend:
- Backend has multibyte characters.
- Windows uses multibyte characters.
psqlodbc has UTF-8 as it's internal formats.

=> A fully working implementation:
- Backend deliveres multibyte characters.
PSQLODBC converts them into UTF-8.
PSQLODBC deliveres multibyte characters to the client
using utf8_to_locale Windows functions, when necessary.

So the solution might be here to do all conversions on the client side!
However the reasoning for this is, that two separate conversion
libraries might
be contradictory with each other, at least with the Asian character sets.
(With MACs, UTF-8 implementation differs from the standard.)

Or then Asian users should move and use UTF-8 as their PostgreSQL
Server's backend format.
That's the other solution for the same problem. Then PostgreSQL Server
doesn't
have to do the conversion.

It does not seem possible to do all the conversion functions inside
PostgreSQL Server under Windows,
because of the xx() -> xxW() mapping inside Windows ODBC manager. We
can't control that.

What do you think about these thoughts?

Marko Ristola

Hiroshi Saito wrote:

>Hi Dave.
>
>I tried your patch by SJIS of Japan. It seems that it needs some additional
>correction. Moreover, it is necessary to make the driver different from
>UNICODE (WideCharacter). It seems that I have to catch up further.
>
>BTW, I remembered the discussion original by pgAdminIII. I said that I
>should support MullutiByte then. However, How is it now? It is very wonderful.
>I feel that that there are many choices of a character code complicates a problem
>more. but, it is although external environment is different.
>
>Regards,
>Hiroshi Saito
>
>------------------------------------------------------------------------
>
>--- convert.c.orig Thu Aug 4 21:26:57 2005
>+++ convert.c Thu Sep 1 04:38:45 2005
>@@ -762,7 +762,7 @@
> {
> BOOL lf_conv = conn->connInfo.lf_conversion;
>
>- if (fCType == SQL_C_WCHAR)
>+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR))
> {
> len = utf8_to_ucs2_lf(neut_str, -1, lf_conv, NULL, 0);
> len *= WCLEN;
>@@ -778,7 +778,7 @@
> }
> else
> #ifdef WIN32
>- if (fCType == SQL_C_CHAR)
>+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_CHAR))
> {
> wstrlen = utf8_to_ucs2_lf(neut_str, -1, lf_conv, NULL, 0);
> allocbuf = (SQLWCHAR *) malloc(WCLEN * (wstrlen + 1));
>@@ -810,7 +810,7 @@
> pgdc->ttlbuflen = len + 1;
> }
>
>- if (fCType == SQL_C_WCHAR)
>+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR))
> {
> utf8_to_ucs2_lf(neut_str, -1, lf_conv, (SQLWCHAR *) pgdc->ttlbuf, len / WCLEN);
> }
>@@ -824,7 +824,7 @@
> }
> else
> #ifdef WIN32
>- if (fCType == SQL_C_CHAR)
>+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_CHAR))
> {
> len = WideCharToMultiByte(CP_ACP, 0, allocbuf, wstrlen, pgdc->ttlbuf, pgdc->ttlbuflen, NULL, NULL);
> free(allocbuf);
>@@ -871,7 +871,7 @@
>
> copy_len = (len >= cbValueMax) ? cbValueMax - 1 : len;
>
>- if (fCType == SQL_C_WCHAR)
>+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR))
> {
> copy_len /= WCLEN;
> copy_len *= WCLEN;
>@@ -911,7 +911,7 @@
> memcpy(rgbValueBindRow, ptr, copy_len);
> /* Add null terminator */
>
>- if (fCType == SQL_C_WCHAR)
>+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR))
> memset(rgbValueBindRow + copy_len, 0, WCLEN);
> else
>
>@@ -942,7 +942,7 @@
> break;
> }
>
>- if (SQL_C_WCHAR == fCType && ! wchanged)
>+ if ((conn->unicode && conn->report_wide_types) && (SQL_C_WCHAR == fCType && ! wchanged))
> {
> if (cbValueMax > (SDWORD) (WCLEN * (len + 1)))
> {
>@@ -2629,6 +2629,8 @@
> case SQL_WCHAR:
> case SQL_WVARCHAR:
> case SQL_WLONGVARCHAR:
>+ if (conn->unicode && conn->report_wide_types)
>+ {
> if (SQL_NTS == used)
> used = strlen(buffer);
> allocbuf = malloc(WCLEN * (used + 1));
>@@ -2637,6 +2639,11 @@
> buf = ucs2_to_utf8((SQLWCHAR *) allocbuf, used, (UInt4 *) &used, FALSE);
> free(allocbuf);
> allocbuf = buf;
>+ {
>+ else
>+ {
>+ buf = buffer;
>+ }
> break;
> default:
> buf = buffer;
>@@ -2647,10 +2654,17 @@
> break;
>
> case SQL_C_WCHAR:
>+ if (conn->unicode && conn->report_wide_types)
>+ {
> if (SQL_NTS == used)
> used = WCLEN * wcslen((SQLWCHAR *) buffer);
> buf = allocbuf = ucs2_to_utf8((SQLWCHAR *) buffer, used / WCLEN, (UInt4 *) &used, FALSE);
> used *= WCLEN;
>+ }
>+ else
>+ {
>+ buf = buffer;
>+ }
> break;
>
> case SQL_C_DOUBLE:
>--- psqlodbc_win32.def.orig Thu Sep 1 04:41:37 2005
>+++ psqlodbc_win32.def Thu Sep 1 04:42:08 2005
>@@ -78,31 +78,3 @@
> DllMain @201
> ConfigDSN @202
>
>-SQLColAttributeW @101
>-SQLColumnPrivilegesW @102
>-SQLColumnsW @103
>-SQLConnectW @104
>-SQLDescribeColW @106
>-SQLExecDirectW @107
>-SQLForeignKeysW @108
>-SQLGetConnectAttrW @109
>-SQLGetCursorNameW @110
>-SQLGetInfoW @111
>-SQLNativeSqlW @112
>-SQLPrepareW @113
>-SQLPrimaryKeysW @114
>-SQLProcedureColumnsW @115
>-SQLProceduresW @116
>-SQLSetConnectAttrW @117
>-SQLSetCursorNameW @118
>-SQLSpecialColumnsW @119
>-SQLStatisticsW @120
>-SQLTablesW @121
>-SQLTablePrivilegesW @122
>-SQLDriverConnectW @123
>-SQLGetDiagRecW @124
>-SQLGetStmtAttrW @125
>-SQLSetStmtAttrW @126
>-SQLSetDescFieldW @127
>-SQLGetTypeInfoW @128
>-SQLGetDiagFieldW @129
>
>
>------------------------------------------------------------------------
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
>

In response to

Responses

Browse pgsql-odbc by date

  From Date Subject
Next Message Marko Ristola 2005-09-03 08:00:45 Re: Unicode support
Previous Message Matthias Weinert 2005-09-03 04:34:08 Re: c++ mfc: problem with bytea