Skip site navigation (1) Skip section navigation (2)

Re: ERROR: translation failed from server encoding to wchar_t

From: ilanco(at)gmail(dot)com
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ERROR: translation failed from server encoding to wchar_t
Date: 2008-01-08 08:43:31
Message-ID: f6d49d30-75b8-48e8-a2ba-dda51a9bb4f3@e4g2000hsg.googlegroups.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Jan 8, 4:14 am, t(dot)(dot)(dot)(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane) wrote:
> ila(dot)(dot)(dot)(at)gmail(dot)com writes:
> > I am using tsearch2 with pgsql 8.2.5 and get the following error when
> > calling to_tsvector :
> > "translation failed from server encoding to wchar_t"
> > My database is UTF8 encoded and the data sent to to_tsvector comes
> > from a bytea column converted to text with
> > encode(COLUMN, 'escape').
>
> Two likely theories:
>
> 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
> some other encoding.
>
> 2. The encode() is yielding something that isn't valid UTF-8.
>
> PG 8.3 contains checks that should complain about both of these
> scenarios, but IIRC 8.2 does not.
>
>                         regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majord(dot)(dot)(dot)(at)postgresql(dot)org so that your
>        message can get through to the mailing list cleanly

Dear Tom,

Thanks for your reply.
This is the output of `locale` on my system :
# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

As for your second scenario I guess you are right, it's possible
encode does not return all UTF8 characters.
But to_tsvector() succeeds and fails at random with this kind of
characters...
So how can I sanitize output from encode before I pipe it to
to_tsvector() ?

Regards,

Ilan



In response to

Responses

pgsql-hackers by date

Next:From: Michael AkindeDate: 2008-01-08 08:50:07
Subject: Re: VACUUM FULL out of memory
Previous:From: Naz GassiepDate: 2008-01-08 08:35:56
Subject: Data from zone.tab

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group