Re: UTF8 encoding and non-text data types

From: "Medi Montaseri" <montaseri(at)gmail(dot)com>
To: "Steve Midgley" <public(at)misuse(dot)org>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: UTF8 encoding and non-text data types
Date: 2008-01-14 20:43:02
Message-ID: 8078a1730801141243o63676d02mdb1e3a539679f79f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Here is my traces from perl CGI code, I'll include two samples one in ASCII
and one UTF so we know what to expect

Here is actual SQL statement being executed in Perl and DBI. I do not quote
the numerical value, just provided to DBI raw.

insert into t1 (c1, cost) values ('tewt', 1234)
this works find....
insert into t1 (c1, cost) values ('&#1588;&#1583;',
&#1777;&#1778;&#1779;&#1780;)
DBD::Pg::db do failed: ERROR: syntax error at or near ";" at character 59,

And the PG log itself is very similar and says
ERROR: syntax error at or near ";" at character 59

Char 59 by the way is the first accurance of semi-colon as in &#17777; which
is being caught by PG parser.

Medi

On Jan 14, 2008 12:18 PM, Steve Midgley <public(at)misuse(dot)org> wrote:

>
> On Jan 13, 2008 8:51 PM, Steve Midgley <public(at)misuse(dot)org> wrote:
> At 02:22 PM 1/13/2008, pgsql-sql-owner(at)postgresql(dot)org wrote:
> >Date: Sat, 12 Jan 2008 14:21:00 -0800
> >From: "Medi Montaseri" < montaseri(at)gmail(dot)com>
> >To: pgsql-sql(at)postgresql(dot)org
> >Subject: UTF8 encoding and non-text data types
> >Message-ID:
> >< 8078a1730801121421l1c9b90c1lc65d36cf6e752a6f(at)mail(dot)gmail(dot)com>
> >
> >I understand PG supports UTF-8 encoding and I have sucessfully
> >inserted
> >Unicode text into columns. I was wondering about other data types such
> >as
> >numbers, decimal, dates
> >
> >That is, say I have a table t1 with
> >create table t1 { name text, cost decimal }
> >I can insert UTF8 text datatype into this table with no problem
> >But if my application attempts to insert numbers encloded in UTF8,
> >then I
> >get wrong datatype error
> >
> >Is the solution for the application layer (not database) to convert
> >the
> >non-text UTF8 numbers to ASCII and then insert it into database ?
> >
> >Thanks
> >Medi
>
> Hi Medi,
>
> I have only limited experience in this area, but it sounds like you
> sending your numbers as strings? In your example:
>
> >create table t1 { name text, cost decimal };
>
> insert into t1 (name, cost) values ('name1', '1');
>
> I can't think of how else you're sending numeric values as UTF8? I know
> that Pg will accept numbers as strings and convert internally (that has
> worked for me in some object relational environments where I don't
> choose to cope with data types), but I think it would be better if you
> simply didn't send your numeric data in quotations, whether as UTF8 or
> ASCII. If you don't have control over this layer (that quotes your
> values), then I'd say converting to ASCII would solve the problem. But
> better to convert to numeric and not ship quoted strings at all.
>
> I may be totally off-base and missing something fundamental and I'm
> very open to correction (by anyone), but that's what I can see here.
>
> Best regards,
>
> Steve
> At 11:01 AM 1/14/2008, Medi Montaseri wrote:
> Thanks Steve,
>
> Actually I do not insert text data into my numeric field.
> As I mentioned given
> create table t1 { name text, cost decimal }
> then I would like to insert numeric data into column "cost" because then I
> can later benefit from numerical operators like SUM, AVG, etc
>
> More specifically, I am using HTML, Perl and PG. So from the HTML point of
> view a textfield is just some strings. So my user would enter 12345 but
> expressed in UTF8. Perl would get this and use DBI to insert it into PG
>
> What I am experiencing now is that DB errors that I am trying to insert an
> incorrect data into column "cost" which is numeric and the data is coming in
> from HTML in UTF8
>
> Mybe I have to convert it to ASCII numbers in Perl before inserting them
> into PG
>
> Thanks
> Medi
>
>
> Hi Medi,
>
> I agree that you should convert your values in Perl before handing to DBI.
> I'm not familiar with DBI but presumably if you're sending it UTF8 values
> it's attempting to quote them or do something with them, that a numeric
> field in Pg can't handle. Can you trap/monitor the exact sql statement that
> is generated by DBI and sent to Pg? That would help a lot in knowing what it
> is doing, but I suspect if you just convert your numbers from the HTML/UTF8
> source values into actual Perl numeric values and then ship to DBI you'll be
> better off. And you'll get some input validation for free.
>
> I hope this helps,
>
> Steve
>

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Steve Midgley 2008-01-14 20:52:38 Re: UTF8 encoding and non-text data types
Previous Message Steve Midgley 2008-01-14 20:18:11 Re: UTF8 encoding and non-text data types