Re: bytea encode performance issues

From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Sim Zacks" <sim(at)compulab(dot)co(dot)il>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: bytea encode performance issues
Date: 2008-08-07 13:41:27
Message-ID: b42b73150808070641p4a45a67bicca80e7227f13687@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Aug 7, 2008 at 9:38 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Thu, Aug 7, 2008 at 1:16 AM, Sim Zacks <sim(at)compulab(dot)co(dot)il> wrote:
>>
>>> I don't quite follow that...the whole point of utf8 encoded database
>>> is so that you can use text functions and operators without the bytea
>>> treatment. As long as your client encoding is set up properly (so
>>> that data coming in and out is computed to utf8), then you should be
>>> ok. Dropping to ascii is usually not the solution. Your data
>>> inputting application should set the client encoding properly and
>>> coerce data into the unicode text type...it's really the only
>>> solution.
>>>
>> Email does not always follow a specific character set. I have tried
>> converting the data that comes in to utf-8 and it does not always work.
>> We receive Hebrew emails which come in mostly 2 flavors, UTF-8 and
>> windows-1255. Unfortunately, they are not compatible with one another.
>> SQL-ASCII and ASCII are different as someone on the list pointed out to
>> me. According to the documentation, SQL-ASCII makes no assumption about
>> encoding, so you can throw in any encoding you want.
>
> no, you can't! SQL-ASCII means that the database treats everything
> like ascii. This means that any operation that deals with text could
> (and in the case of Hebrew, almost certianly will) be broken. Simple
> things like getting the length of a string will be wrong. If you are
> accepting unicode input, you absolutely must be using a unicode
> encoded backend.

er, I see the problem (single piece of text with multiple encodings
inside) :-). ok, it's more complicated than I thought. still, you
need to convert the email to utf8. There simply must be a way,
otherwise your emails are not well defined. This is a client side
problem...if you push it to the server in ascii, you can't use any
server side text operations reliably.

merlin

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Anderson dos Santos Donda 2008-08-07 13:55:06 Re: Create Table Dinamic
Previous Message Igor Neyman 2008-08-07 13:40:59 Re: Create Table Dinamic