Re: UTF8 encoding and non-text data types

From: Steve Midgley <public(at)misuse(dot)org>
To: "Medi Montaseri" <montaseri(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: UTF8 encoding and non-text data types
Date: 2008-01-14 20:52:38
Message-ID: 20080114205255.60EC62E0068@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

At 12:43 PM 1/14/2008, Medi Montaseri wrote:
>Here is my traces from perl CGI code, I'll include two samples one in
>ASCII and one UTF so we know what to expect
>
>Here is actual SQL statement being executed in Perl and DBI. I do not
>quote the numerical value, just provided to DBI raw.
>
>insert into t1 (c1, cost) values ('tewt', 1234)
>this works find....
>insert into t1 (c1, cost) values ('&#1588;&#1583;',
>&#1777;&#1778;&#1779;&#1780;)
> DBD::Pg::db do failed: ERROR: syntax error at or near ";" at
> character 59,
>
>And the PG log itself is very similar and says
>ERROR: syntax error at or near ";" at character 59
>
>Char 59 by the way is the first accurance of semi-colon as in &#17777;
>which is being caught by PG parser.
>
>Medi
>
>
>On Jan 14, 2008 12:18 PM, Steve Midgley
><<mailto:public(at)misuse(dot)org>public(at)misuse(dot)org> wrote:
>
>>On Jan 13, 2008 8:51 PM, Steve Midgley
>><<mailto:public(at)misuse(dot)org>public(at)misuse(dot)org> wrote:
>>At 02:22 PM 1/13/2008,
>><mailto:pgsql-sql-owner(at)postgresql(dot)org>pgsql-sql-owner(at)postgresql(dot)org
>>wrote:
>> >Date: Sat, 12 Jan 2008 14:21:00 -0800
>> >From: "Medi Montaseri" <<mailto:montaseri(at)gmail(dot)com>
>> montaseri(at)gmail(dot)com>
>> >To: <mailto:pgsql-sql(at)postgresql(dot)org>pgsql-sql(at)postgresql(dot)org
>> >Subject: UTF8 encoding and non-text data types
>> >Message-ID:
>> ><<mailto:8078a1730801121421l1c9b90c1lc65d36cf6e752a6f(at)mail(dot)gmail(dot)com
>> > 8078a1730801121421l1c9b90c1lc65d36cf6e752a6f(at)mail(dot)gmail(dot)com>
>> >
>> >I understand PG supports UTF-8 encoding and I have sucessfully
>> >inserted
>> >Unicode text into columns. I was wondering about other data types
>> such
>> >as
>> >numbers, decimal, dates
>> >
>> >That is, say I have a table t1 with
>> >create table t1 { name text, cost decimal }
>> >I can insert UTF8 text datatype into this table with no problem
>> >But if my application attempts to insert numbers encloded in UTF8,
>> >then I
>> >get wrong datatype error
>> >
>> >Is the solution for the application layer (not database) to convert
>>
>> >the
>> >non-text UTF8 numbers to ASCII and then insert it into database ?
>> >
>> >Thanks
>> >Medi
>>Hi Medi,
>>I have only limited experience in this area, but it sounds like you
>>sending your numbers as strings? In your example:
>> >create table t1 { name text, cost decimal };
>>insert into t1 (name, cost) values ('name1', '1');
>>I can't think of how else you're sending numeric values as UTF8? I
>>know
>>that Pg will accept numbers as strings and convert internally (that
>>has
>>worked for me in some object relational environments where I don't
>>choose to cope with data types), but I think it would be better if
>>you
>>simply didn't send your numeric data in quotations, whether as UTF8
>>or
>>ASCII. If you don't have control over this layer (that quotes your
>>values), then I'd say converting to ASCII would solve the problem.
>>But
>>better to convert to numeric and not ship quoted strings at all.
>>I may be totally off-base and missing something fundamental and I'm
>>very open to correction (by anyone), but that's what I can see here.
>>Best regards,
>>Steve
>>At 11:01 AM 1/14/2008, Medi Montaseri wrote:
>>Thanks Steve,
>>
>>Actually I do not insert text data into my numeric field.
>>As I mentioned given
>>create table t1 { name text, cost decimal }
>>then I would like to insert numeric data into column "cost" because
>>then I can later benefit from numerical operators like SUM, AVG, etc
>>
>>More specifically, I am using HTML, Perl and PG. So from the HTML
>>point of view a textfield is just some strings. So my user would
>>enter 12345 but expressed in UTF8. Perl would get this and use DBI to
>>insert it into PG
>>
>>What I am experiencing now is that DB errors that I am trying to
>>insert an incorrect data into column "cost" which is numeric and the
>>data is coming in from HTML in UTF8
>>
>>Mybe I have to convert it to ASCII numbers in Perl before
>>inserting them into PG
>>
>>Thanks
>>Medi
>
>Hi Medi,
>
>I agree that you should convert your values in Perl before handing to
>DBI. I'm not familiar with DBI but presumably if you're sending it
>UTF8 values it's attempting to quote them or do something with them,
>that a numeric field in Pg can't handle. Can you trap/monitor the
>exact sql statement that is generated by DBI and sent to Pg? That
>would help a lot in knowing what it is doing, but I suspect if you
>just convert your numbers from the HTML/UTF8 source values into actual
>Perl numeric values and then ship to DBI you'll be better off. And
>you'll get some input validation for free.
>
>I hope this helps,
>
>Steve
>
Hi Medi,

That structure for numeric values is never going to work, as best as I
understand Postgres (and other sql pipes). You have to convert those
UTF chars to straight numeric format. Hopefully that solves your
problem? I hope it's not too hard for you to get at the code which is
sending the numbers as UTF?

Steve

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2008-01-14 22:41:23 Re: UTF8 encoding and non-text data types
Previous Message Medi Montaseri 2008-01-14 20:43:02 Re: UTF8 encoding and non-text data types