Re: ERROR: translation failed from server encoding to wchar_t

From: ilanco(at)gmail(dot)com
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ERROR: translation failed from server encoding to wchar_t
Date: 2008-01-08 08:43:31
Message-ID: f6d49d30-75b8-48e8-a2ba-dda51a9bb4f3@e4g2000hsg.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jan 8, 4:14 am, t(dot)(dot)(dot)(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane) wrote:
> ila(dot)(dot)(dot)(at)gmail(dot)com writes:
> > I am using tsearch2 with pgsql 8.2.5 and get the following error when
> > calling to_tsvector :
> > "translation failed from server encoding to wchar_t"
> > My database is UTF8 encoded and the data sent to to_tsvector comes
> > from a bytea column converted to text with
> > encode(COLUMN, 'escape').
>
> Two likely theories:
>
> 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
> some other encoding.
>
> 2. The encode() is yielding something that isn't valid UTF-8.
>
> PG 8.3 contains checks that should complain about both of these
> scenarios, but IIRC 8.2 does not.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majord(dot)(dot)(dot)(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

Dear Tom,

Thanks for your reply.
This is the output of `locale` on my system :
# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

As for your second scenario I guess you are right, it's possible
encode does not return all UTF8 characters.
But to_tsvector() succeeds and fails at random with this kind of
characters...
So how can I sanitize output from encode before I pipe it to
to_tsvector() ?

Regards,

Ilan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Akinde 2008-01-08 08:50:07 Re: VACUUM FULL out of memory
Previous Message Naz Gassiep 2008-01-08 08:35:56 Data from zone.tab