Re: ERROR: translation failed from server encoding to wchar_t

From: ilanco(at)gmail(dot)com
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ERROR: translation failed from server encoding to wchar_t
Date: 2008-01-08 11:49:53
Message-ID: a8e7c7d6-1c1f-4a72-b93e-c2a02d11b173@l32g2000hse.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jan 8, 10:43 am, ila(dot)(dot)(dot)(at)gmail(dot)com wrote:
> On Jan 8, 4:14 am, t(dot)(dot)(dot)(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane) wrote:
>
>
>
> > ila(dot)(dot)(dot)(at)gmail(dot)com writes:
> > > I am using tsearch2 with pgsql 8.2.5 and get the following error when
> > > calling to_tsvector :
> > > "translation failed from server encoding to wchar_t"
> > > My database is UTF8 encoded and the data sent to to_tsvector comes
> > > from a bytea column converted to text with
> > > encode(COLUMN, 'escape').
>
> > Two likely theories:
>
> > 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
> > some other encoding.
>
> > 2. The encode() is yielding something that isn't valid UTF-8.
>
> > PG 8.3 contains checks that should complain about both of these
> > scenarios, but IIRC 8.2 does not.
>
> > regards, tom lane
>
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majord(dot)(dot)(dot)(at)postgresql(dot)org so that your
> > message can get through to the mailing list cleanly
>
> Dear Tom,
>
> Thanks for your reply.
> This is the output of `locale` on my system :
> # locale
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_PAPER="en_US.UTF-8"
> LC_NAME="en_US.UTF-8"
> LC_ADDRESS="en_US.UTF-8"
> LC_TELEPHONE="en_US.UTF-8"
> LC_MEASUREMENT="en_US.UTF-8"
> LC_IDENTIFICATION="en_US.UTF-8"
> LC_ALL=
>
> As for your second scenario I guess you are right, it's possible
> encode does not return all UTF8 characters.
> But to_tsvector() succeeds and fails at random with this kind of
> characters...
> So how can I sanitize output from encode before I pipe it to
> to_tsvector() ?
>
> Regards,
>
> Ilan

Tom,

To get around the non-UTF8 chars I used following function :

CREATE OR REPLACE FUNCTION u_messageblk_idxfti() RETURNS "trigger"
AS $$
DECLARE
BEGIN
RAISE NOTICE '[DBMAIL] Trying ID %', NEW.messageblk_idnr;
BEGIN
NEW.idxFTI := to_tsvector('simple', encode($x$E$x$||
NEW.messageblk, 'escape'));
RAISE NOTICE '[DBMAIL] Ended ID %', NEW.messageblk_idnr;
RETURN NEW;
EXCEPTION
WHEN character_not_in_repertoire THEN
RAISE WARNING '[DBMAIL] character_not_in_repertoire ID %',
NEW.messageblk_idnr;
NEW.idxFTI := to_tsvector('simple',
'character_not_in_repertoire: This email contains illegal
characters.');
RETURN NEW;
END;
END;
$$
LANGUAGE plpgsql;

Hope this helps others with DBmail and tsearch2 on postgres 8.2

Thanks for your help Tom,

ilan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2008-01-08 12:53:00 Re: ERROR: translation failed from server encoding to wchar_t
Previous Message Magnus Hagander 2008-01-08 09:59:26 Re: 8.3.0 release schedule (Was:Re: [BUGS] BUG #3852: Could not create complex aggregate)