Re: Bug or not about ASCII and Multi-Byte character set

From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Marc Herbert <Marc(dot)Herbert(at)emicnetworks(dot)com>
Cc: pgsql-odbc(at)postgresql(dot)org
Subject: Re: Bug or not about ASCII and Multi-Byte character set
Date: 2005-08-19 14:11:48
Message-ID: 4305E8A4.6000306@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-odbc

Marc Herbert wrote:

>If SQL_ASCII is/was equivalent to "ignoring encoding", then it
>looks/looked pretty misnamed!
>
It's not. It should be used for ASCII only, but the database system will
not barf if you offer it a byte with the upper bit set. You're simply on
your own.

>Encoding ignorance should rather be called SQL_BINARY. A BINARY setting
>for strings makes sense, just like when transfering text files using
>FTP: you just don't trust FTP for encodings and use it like a
>filesystem. BINARY just means that: "don't mess-up with encodings and
>let something else deal with the issue".
>
>
No, binary would include 0x00 which is definitely *not* a character but
the string terminator. If SQL_ASCII would be implemented nowadays, there
probably would be a check for the upper bit cleared, and have it
rejected otherwise. But since this part is really really old, this can't
be changed without breaking zillions of old apps that used to ignore
proper storage encoding.

>I guess some people knew what they did and simply did not mixed
>driver/apps, or in a way they mastered and that worked.
>
>
The latter, with the obvious chance to break if the next app accesses
the data. This is certainly not the design goal of a RDBMS.

>Well while reading at the complaints it seems this BINARY mode was
>there before (by "accident"?),
>
No.

>Looks like people fixed issues by themselves before,
>
They didn't fix anything, they worked around the wrong chosen server
encoding. I perfectly understand this, because initially I did the same
mistake.

> and Postgres
>recent fixing does not interact nicely with theirs?
>
>
Automatically choosing the right client encoding and properly converting
in the driver did (and maybe still has) bugs, but fixing these will
certainly support the rules as proper design requires it, not
ill-designed apps.

>PS: BTW "unicode" is not one encoding but many different ones.
>
>
Doesn't matter. Always means the current Unicode for the system: in the
backend UTF-8, on Win32 UCS16, Linux UCS32 or UTF-8 dependent on
interface definition. The *driver* has to take care of the proper
conversion, *if* it is instructed correctly (i.e. correct server encoding)

Regards,
Andreas

In response to

Responses

Browse pgsql-odbc by date

  From Date Subject
Next Message Marc Herbert 2005-08-19 18:05:03 Re: Bug or not about ASCII and Multi-Byte character set
Previous Message Joel Fradkin 2005-08-19 12:40:15 Re: Bug or not about ASCII and Multi-Byte character set