Skip site navigation (1) Skip section navigation (2)

Re: full text search, utf8

From: alexander lunyov <lan(at)zato(dot)ru>
To: pgsql-ru-general(at)postgresql(dot)org
Subject: Re: full text search, utf8
Date: 2009-06-03 11:56:59
Message-ID: 4A26650B.400@zato.ru (view raw or flat)
Thread:
Lists: pgsql-ru-general
I can answer in english if you like.

This error happening also when i'm trying to CREATE TEXT SEARCH DICTIONARY:

ports=# CREATE TEXT SEARCH DICTIONARY ruispell (
ports(#     TEMPLATE = ispell,
ports(#     DictFile = russian,
ports(#     AffFile = russian,
ports(#     StopWords = russian
ports(# );
ERROR:  неверная последовательность байт имя кодировки "UTF8": 0xd1
ПОДСКАЗКА:  This error can also happen if the byte sequence does not 
match the encoding expected by the server, which is controlled by 
"client_encoding".
ports=#

All data in table populated with perl script that read text file in UTF8 
and make INSERTs, and i think if there was illegal character, error 
would appear after INSERT.


Andrew Boag wrote:
> sorry for English response (I don't have Russian keyboard here)
> 
> 0xd1 may be an illegal UTF8 chaacter that was mistakenly allowed into 
> the table. Not all libraries (or all versions of postgres) prevent 
> illegal UTF8 characters from getting into DB.
> 
> We saw similar issues with a 7.4 -> 8.1 postgres data migration.
> 
> However, I don't fully understand your select query so there may be 
> another cause.
> 
> alexander lunyov wrote:
>> Здравствуйте.
>>
>> Имеется freebsd 6.2, postgresql-8.3.1
>>
>> В env:
>>
>> % env | grep UTF
>> LANG=ru_RU.UTF-8
>> MM_CHARSET=UTF-8
>>
>> % psql ports -U pgsql
>> Welcome to psql 8.3.1, the PostgreSQL interactive terminal.
>>
>> Type:  \copyright for distribution terms
>>        \h for help with SQL commands
>>        \? for help with psql commands
>>        \g or terminate with semicolon to execute query
>>        \q to quit
>>
>> ports=# \encoding
>> UTF8
>> ports=# \l
>>         Список баз данных
>>     Имя    | Владелец | Кодировка
>> -----------+----------+-----------
>>  ports     | pgsql    | UTF8
>>  postgres  | pgsql    | UTF8
>>  template0 | pgsql    | UTF8
>>  template1 | pgsql    | UTF8
>> (4 rows)
>>
>> Пробую поискать в таблице, и вот результат:
>>
>> ports=# select name from abonents where to_tsvector(name) @@ 
>> to_tsquery('s');
>> ERROR:  неверная последовательность байт имя кодировки "UTF8": 0xd1
>> ПОДСКАЗКА:  This error can also happen if the byte sequence does not 
>> match the encoding expected by the server, which is controlled by 
>> "client_encoding".
>>
>> при этом в конфигурации english работает нормально.
>>
>> # select count(name) from abonents where to_tsvector('english',name) 
>> @@ to_tsquery('some');
>>  count
>> -------
>>      6
>> (1 запись)
>>
>> Почему?
>>
> 
> 


-- 
С уважением
Александр Лунев
ОАО РТК

In response to

Responses

pgsql-ru-general by date

Next:From: Сергей Бурладя =?utf-8?B?0L0=?=Date: 2009-06-03 23:55:14
Subject: Re: full text search, utf8
Previous:From: alexander lunyovDate: 2009-06-03 09:29:27
Subject: full text search, utf8

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group