Skip site navigation (1) Skip section navigation (2)

[ psqlodbc-Bugs-1010313 ] ucs2_to_utf8 endianness problem

From: <noreply(at)pgfoundry(dot)org>
To: noreply(at)pgfoundry(dot)org
Subject: [ psqlodbc-Bugs-1010313 ] ucs2_to_utf8 endianness problem
Date: 2011-11-01 23:59:57
Message-ID: 20111101235957.B6A04532E3D2@pgfoundry.org (view raw or flat)
Thread:
Lists: pgsql-odbc
Bugs item #1010313, was opened at 2008-03-12 16:43
You can respond by visiting: 
http://pgfoundry.org/tracker/?func=detail&atid=538&aid=1010313&group_id=1000125

Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 3
Submitted By: Ken Robbins (kpr)
Assigned to: Nobody (None)
Summary: ucs2_to_utf8 endianness problem

Initial Comment:
psqlodbc 08.02.0200
PostgreSQL 8.2.3

I'm using SQLBindParameter() (using SQL_WVARCHAR and SQL_C_WCHAR as the types) along with SQLExecute() to execute a prepared statement.

In the byte buffer, I'm using UCS-2 encoding.  E.g., I'm doing something like this:

wchar_t c = 0x03C0; // lower case greek letter pi
char buf[255];
memcpy(buf, (char*) c, sizeof(wchar_t));

This eventually gets passed into ucs2_to_utf8().

char *ucs2_to_utf8(const SQLWCHAR *ucs2str, SQLLEN ilen, SQLLEN *olen, BOOL lower_identifier)

Whenever, a UTF-8 byte sequence is more than one byte, a memcpy is used on either the UInt2 or Int4 types.  E.g.,

memcpy(utf8str + len, (char *) &byte2code, sizeof(byte2code));

However, the ordering of the bytes of byte2code is different depending on whether the platform is little endian or big endian.  When I run my code on an Intel environment (Linux), the code runs fine.  However, when I run my code on a PowerPC environment (also Linux), the UTF-8 byte sequence is wrong.

I added mylog() calls to the ucs2_to_utf8() code to see what bytes were at the memcpy step and also the final byte sequence.  The bytes are correct; however, the ordering is flipped (understandably).  For the 2 byte sequences, the ordering is just flipped.  For the 3 byte sequences, in addition to the ordering being flipped, the wrong 3 bytes are being used.

Whenever I use SQL_VARCHAR and SQL_C_CHAR and put the UTF-8 byte sequence in the byte buffer myself, it works fine on both platforms.

I believe that ucs2_to_utf8() needs to account for the endianness of the platform, so the right bytes are put in the final returned UTF-8 sequence.  However, if I am not doing something right, please advise me on that also.

----------------------------------------------------------------------

You can respond by visiting: 
http://pgfoundry.org/tracker/?func=detail&atid=538&aid=1010313&group_id=1000125

pgsql-odbc by date

Next:From: noreplyDate: 2011-11-02 00:01:19
Subject: [ psqlodbc-Bugs-1010376 ] GUID code missing in convert.c ("Unrecognized C_parameter ..") and fix
Previous:From: noreplyDate: 2011-11-01 23:53:08
Subject: [ psqlodbc-Bugs-1010208 ] 64bit ODBC for Windows

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group