| From: | Steve Midgley <public(at)misuse(dot)org> | 
|---|---|
| To: | "Medi Montaseri" <montaseri(at)gmail(dot)com> | 
| Cc: | pgsql-sql(at)postgresql(dot)org | 
| Subject: | Re: UTF8 encoding and non-text data types | 
| Date: | 2008-01-14 20:52:38 | 
| Message-ID: | 20080114205255.60EC62E0068@postgresql.org | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-sql | 
At 12:43 PM 1/14/2008, Medi Montaseri wrote:
>Here is my traces from perl CGI code, I'll include two samples one in 
>ASCII and one UTF so we know what to expect
>
>Here is actual SQL statement being executed in Perl and DBI. I do not 
>quote the numerical value, just provided to DBI raw.
>
>insert into t1 (c1, cost) values ('tewt', 1234)
>this works find....
>insert into t1 (c1, cost) values ('شد', 
>۱۲۳۴)
>  DBD::Pg::db do failed: ERROR:  syntax error at or near ";" at 
> character 59,
>
>And the PG log itself is very similar and says
>ERROR:  syntax error at or near ";" at character 59
>
>Char 59 by the way is the first accurance of semi-colon as in 䕱 
>which is being caught by PG parser.
>
>Medi
>
>
>On Jan 14, 2008 12:18 PM, Steve Midgley 
><<mailto:public(at)misuse(dot)org>public(at)misuse(dot)org> wrote:
>
>>On Jan 13, 2008 8:51 PM, Steve Midgley 
>><<mailto:public(at)misuse(dot)org>public(at)misuse(dot)org> wrote:
>>At 02:22 PM 1/13/2008, 
>><mailto:pgsql-sql-owner(at)postgresql(dot)org>pgsql-sql-owner(at)postgresql(dot)org 
>>wrote:
>> >Date: Sat, 12 Jan 2008 14:21:00 -0800
>> >From: "Medi Montaseri" <<mailto:montaseri(at)gmail(dot)com> 
>> montaseri(at)gmail(dot)com>
>> >To: <mailto:pgsql-sql(at)postgresql(dot)org>pgsql-sql(at)postgresql(dot)org
>> >Subject: UTF8 encoding and non-text data types
>> >Message-ID:
>> ><<mailto:8078a1730801121421l1c9b90c1lc65d36cf6e752a6f(at)mail(dot)gmail(dot)com 
>>  > 8078a1730801121421l1c9b90c1lc65d36cf6e752a6f(at)mail(dot)gmail(dot)com>
>> >
>> >I understand PG supports UTF-8 encoding and I have sucessfully
>> >inserted
>> >Unicode text into columns. I was wondering about other data types 
>> such
>> >as
>> >numbers, decimal, dates
>> >
>> >That is, say I have a table t1 with
>> >create table t1 { name text, cost decimal }
>> >I can insert UTF8 text datatype into this table with no problem
>> >But if my application attempts to insert numbers encloded in UTF8,
>> >then I
>> >get wrong datatype error
>> >
>> >Is the solution for the application layer (not database) to convert 
>>
>> >the
>> >non-text UTF8 numbers to ASCII and then insert it into database ?
>> >
>> >Thanks
>> >Medi
>>Hi Medi,
>>I have only limited experience in this area, but it sounds like you
>>sending your numbers as strings? In your example:
>> >create table t1 { name text, cost decimal };
>>insert into t1 (name, cost) values ('name1', '1');
>>I can't think of how else you're sending numeric values as UTF8? I 
>>know
>>that Pg will accept numbers as strings and convert internally (that 
>>has
>>worked for me in some object relational environments where I don't
>>choose to cope with data types), but I think it would be better if 
>>you
>>simply didn't send your numeric data in quotations, whether as UTF8 
>>or
>>ASCII. If you don't have control over this layer (that quotes your
>>values), then I'd say converting to ASCII would solve the problem. 
>>But
>>better to convert to numeric and not ship quoted strings at all.
>>I may be totally off-base and missing something fundamental and I'm
>>very open to correction (by anyone), but that's what I can see here.
>>Best regards,
>>Steve
>>At 11:01 AM 1/14/2008, Medi Montaseri wrote:
>>Thanks Steve,
>>
>>Actually I do not insert text data into my numeric field.
>>As I mentioned given
>>create table t1 { name text, cost decimal }
>>then I would like to insert numeric data into column "cost" because 
>>then I can later benefit from numerical operators like SUM, AVG, etc
>>
>>More specifically, I am using HTML, Perl and PG. So from the HTML 
>>point of view a textfield is just some strings. So my user would 
>>enter 12345 but expressed in UTF8. Perl would get this and use DBI to 
>>insert it into PG
>>
>>What I am experiencing now is that DB errors that I am trying to 
>>insert an incorrect data into column "cost" which is numeric and the 
>>data is coming in from HTML in UTF8
>>
>>Mybe I have to convert it to ASCII numbers in Perl before 
>>inserting  them into PG
>>
>>Thanks
>>Medi
>
>Hi Medi,
>
>I agree that you should convert your values in Perl before handing to 
>DBI. I'm not familiar with DBI but presumably if you're sending it 
>UTF8 values it's attempting to quote them or do something with them, 
>that a numeric field in Pg can't handle. Can you trap/monitor the 
>exact sql statement that is generated by DBI and sent to Pg? That 
>would help a lot in knowing what it is doing, but I suspect if you 
>just convert your numbers from the HTML/UTF8 source values into actual 
>Perl numeric values and then ship to DBI you'll be better off. And 
>you'll get some input validation for free.
>
>I hope this helps,
>
>Steve
>
Hi Medi,
That structure for numeric values is never going to work, as best as I 
understand Postgres (and other sql pipes). You have to convert those 
UTF chars to straight numeric format. Hopefully that solves your 
problem? I hope it's not too hard for you to get at the code which is 
sending the numbers as UTF?
Steve
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2008-01-14 22:41:23 | Re: UTF8 encoding and non-text data types | 
| Previous Message | Medi Montaseri | 2008-01-14 20:43:02 | Re: UTF8 encoding and non-text data types |