Skip site navigation (1) Skip section navigation (2)

Re: 0xc3 error Text Search Windows French

From: Andrew <archa(at)pacific(dot)net(dot)au>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: 0xc3 error Text Search Windows French
Date: 2008-06-25 18:44:28
Message-ID: 4862920C.4010801@pacific.net.au (view raw or flat)
Thread:
Lists: pgsql-general
One additional aspect.  I just ran the create text search dictionary 
command without the stopfile declaration using the OO dictionaries, and 
it worked fine with the select ts_lexize('public.fr_ispell', 
'catalogue'); command executing with no problems.  However, after 
creating an associated catalogue based on a copy of the 
pg_catalog.french catalogue, calls to ts_debug against my custom French 
config result in the 0xc3 error.  So it is looking like the problem is 
restricted to the parsing of the stop file. 

I ran through the other out of the box supplied stemmers, which I have 
not touched in anyway and it is also occurring with the portuguese 
catalogue.

Cheers

Andy

Andrew wrote:
> I have a feeling that an issue I'm running into is related to this: 
> http://archives.postgresql.org/pgsql-bugs/2008-06/msg00113.php
>
> On Windows XP running PgAdmin III 1.8.4 against either PostgreSQL 
> 8.3.0 or 8.3.3 DB, when attempting to do a:
>
> select * from ts_debug('french', 'catalogue');
>
> getting the following error:
>
> ERROR:  invalid byte sequence for encoding "UTF8": 0xc3
> HINT:  This error can also happen if the byte sequence does not match 
> the encoding expected by the server, which is controlled by 
> "client_encoding".
> CONTEXT:  SQL function "ts_debug" statement 1
>
> I have replaced the french.stop file with the one from the snowball 
> web site (http://snowball.tartarus.org/algorithms/french/stemmer.html) 
> to see if that would make any difference. But the same issue.  I have 
> also attempted to load the French Hunspell dictionary from the Open 
> Office web site 
> (http://wiki.services.openoffice.org/wiki/Dictionaries), using the 
> following command:
>
> CREATE TEXT SEARCH DICTIONARY public.fr_ispell (
>    TEMPLATE = pg_catalog.ispell,
>    DictFile = fr_FR,
>    AffFile = fr_FR,
>    StopWords = french
> );
>
> But getting the same error.  I have successfully loaded the English 
> and Arabic dictionaries and an Arabic stop file I sourced from 
> elsewhere, and they work fine with the various text search function 
> calls, so it appears to be specifically related to a French character 
> occurring in the stop file and the dictionaries.  To use the French OO 
> dictionaries, I had to convert them from an ISO-8859-15 character set 
> encoding to UTF-8.  As it still had the same result as with the 
> packaged stop file when converting on Windows, I downloaded them and 
> converted the encoding on a Linux machine before copying them across 
> to windows to see if that would help, but it didn't.
>
> However, if I run the ts_debug('french', 'catalogue'); against a Linux 
> version of PostgreSQL 8.3.1, it works fine.  I have not tried version 
> 8.3.1 on Windows.  While there are a lot more combinations to exhaust 
> before I can make a categorical statement, at this stage it appears to 
> be pointing towards an issue with the UTF-8 parser of PostgreSQL on 
> Windows.
>
> Is this an outstanding defect, or is there something that I'm doing 
> wrong in my environment?  I have attempted to find anything related on 
> the Internet, but other than the introductory reference, I have not 
> found anything, which for what I would imagine to be, of the size of 
> the French user base surprises me.  Hence, I'm thinking that perhaps 
> it may be something in my environment causing the issue.  If others 
> could also reproduce the error on their XP machines, that would 
> indicate that the issue was not something specific just to me.
>
> At this stage, it is not that important to me, as I'm just playing 
> around with text search for my own curiosity and French was just a 
> language I have randomly picked, along with Arabic (for which I'm 
> lacking a snowball stemmer).  I don't actually read, much less speak 
> those languages.  However, it would still be nice to have them working.
>
> An additional related topic.  OO have for some languages, thesaurus 
> files which are not in the same format as supported by Pg Full Text 
> Search.  Are there any plans to support the OO thesaurus file 
> formats?  They also have hyphenation files. Are there any plans to 
> extend the current dictionary files to include hyphenation rules as 
> captured in the OO hyphenation files?  I'm not sure how, if at all 
> hyphenation rules would improve on indexing and searches, but I 
> thought as the files exist, I would pose the question.
>
> Thanks,
>
> Andy
>
>
>
>
>


In response to

Responses

pgsql-general by date

Next:From: AndrewDate: 2008-06-25 18:49:36
Subject: Re: 0xc3 error Text Search Windows French
Previous:From: AndrewDate: 2008-06-25 18:21:42
Subject: 0xc3 error Text Search Windows French

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group