Re: Unicode support

From: Marko Ristola <Marko(dot)Ristola(at)kolumbus(dot)fi>
To:
Cc: Hiroshi Saito <saito(at)inetrt(dot)skcapi(dot)co(dot)jp>, Anoop Kumar <anoopk(at)pervasive-postgres(dot)com>, pgsql-odbc(at)postgresql(dot)org
Subject: Re: Unicode support
Date: 2005-09-01 17:20:32
Message-ID: 43173860.3060407@kolumbus.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-odbc


Hi all.

How about creating a charset conversion interface
and taking UTF-8 as an internal format for ODBC?:

At least the following functions might be needed:

Internal2WChar()
WChar2Internal()

Internal2Char()
Char2Internal()

Backend would talk only UTF-8.

Here is a minimum set of interface
(Object oriented design term) functions:

cvt_FromUTF8()
cvt_ToUTF8()
cvt_Free()

Interface implementation:

struct CvtInterface {
char (*cvt_FromUTF8)(void *internalData, char *source, size_t bytes);
char (*cvt_ToUTF8)(void *internalData, char *source, size_t bytes);
void (*cvt_Free)(void *internalData);

void *internalData;
}
Object creation:

struct Env {
struct CvtInterface char_cvt; // C program char conversions
struct CvtInterface wchar_cvt; // C program wchar_t conversions
};

struct CvtInterface utf8_to_utf8_New();
env->char_cvt = utf8_to_utf8_New();

These are some interface implementation functions:
(I don't know, how many are needed, but at least
supporting of char, wchar and multibyte is needed).

sjis_new()
sjis_FromUTF8()
sjis_ToUTF8()
sjis_Free()

wchar_FromUTF8()
wchar_ToUTF8()
wchar_Free()

char_FromUTF8()
char_ToUTF8()
char_Free()

utf8_FromUTF8()
utf8_ToUTF8()
utf8_Free()

ascii8_FromUTF8()
ascii8_ToUTF8()
ascii8_Free()

So, there would be a single internal UTF-8 format inside PsqlODBC.
The backend could always deliver UTF-8, so the need for internal
format <-> backend format layer is not needed.

This implementation would be easy to implement.

Examples:

A C program calls SQLExecuteW.
AllocEnv has found out, that the wchar format is UCS-2.
So it has created an object:
env->char_cvt = cvt_ucs2_UTF8_New();

The PGAPI function needs to convert from WCHAR into internal format:
sqlquery = (*env->char_cvt->cvt_ToUTF8)(wcharquery);
Then the sqlquery is in UTF8, and the query is in
an easilly manageable format!

A C program uses SQLGetDataW to get a string.
So when the data will be converted in convert.c, psqlodbc calls:
result = (*env->char_cvt->cvt_FromUTF8)(internalformat);

I don't know, wether ENV handle is the best place to put the converter
objects.

I like about this implementation:
- Simplifies support for clients using different charsets.
- Simplifies psqodbc internally, because of internal UTF8 assumption.
- Easy to implement and to test.
- Easy to add more converters, when the initial implementation works.
- Enables usage of advanced lexers and parsers when needed to improve
performance.
- PSQLODBC will support well all UTF-8 supported charsets.

I have not suggested this before, because of the following reasons:
- psqlodbc charset conversion implementation seems to work most times.
- Avoiding unnecessary charset conversions is good for performance.
- It takes time to implement and test this.
- Unnecessary malloc + free is bad for performance.

What do you think about this?
Would this solve the problems?
Is this implementable?
Would the performance be good enough?
Would this simplify things (that's the Goal)?

Regards,
Marko Ristola

Dave Page wrote:

>
>
>
>
>>-----Original Message-----
>>From: Hiroshi Saito [mailto:saito(at)inetrt(dot)skcapi(dot)co(dot)jp]
>>Sent: 31 August 2005 21:00
>>To: Hiroshi Saito; Dave Page; Anoop Kumar
>>Cc: pgsql-odbc(at)postgresql(dot)org
>>Subject: Re: [ODBC] Unicode support
>>
>>Hi Dave.
>>
>>I tried your patch by SJIS of Japan. It seems that it needs
>>some additional
>>correction. Moreover, it is necessary to make the driver
>>different from
>>UNICODE (WideCharacter). It seems that I have to catch up further.
>>
>>
>
>Hmmm, well I can't remove the Unicode functions. Do your apps request
>SQL_C_WCHAR etc even if the driver doesn't offer it?
>
>
>

>>BTW, I remembered the discussion original by pgAdminIII. I
>>said that I
>>should support MullutiByte then. However, How is it now? It
>>is very wonderful.
>>I feel that that there are many choices of a character code
>>complicates a problem
>>more. but, it is although external environment is different.
>>
>>
>
>Hmm, I hate multibyte :-(!!
>
>Regards, Dave
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
>

In response to

Browse pgsql-odbc by date

  From Date Subject
Next Message Greg Campbell 2005-09-01 18:06:39 Re: figuring out why I am having this issue
Previous Message Joel Fradkin 2005-09-01 16:35:58 Re: figuring out why I am having this issue