Re: full text search, utf8

From: alexander lunyov <lan(at)zato(dot)ru>
To: pgsql-ru-general(at)postgresql(dot)org
Subject: Re: full text search, utf8
Date: 2009-06-03 11:56:59
Message-ID: 4A26650B.400@zato.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-ru-general

I can answer in english if you like.

This error happening also when i'm trying to CREATE TEXT SEARCH DICTIONARY:

ports=# CREATE TEXT SEARCH DICTIONARY ruispell (
ports(# TEMPLATE = ispell,
ports(# DictFile = russian,
ports(# AffFile = russian,
ports(# StopWords = russian
ports(# );
ERROR: неверная последовательность байт имя кодировки "UTF8": 0xd1
ПОДСКАЗКА: This error can also happen if the byte sequence does not
match the encoding expected by the server, which is controlled by
"client_encoding".
ports=#

All data in table populated with perl script that read text file in UTF8
and make INSERTs, and i think if there was illegal character, error
would appear after INSERT.

Andrew Boag wrote:
> sorry for English response (I don't have Russian keyboard here)
>
> 0xd1 may be an illegal UTF8 chaacter that was mistakenly allowed into
> the table. Not all libraries (or all versions of postgres) prevent
> illegal UTF8 characters from getting into DB.
>
> We saw similar issues with a 7.4 -> 8.1 postgres data migration.
>
> However, I don't fully understand your select query so there may be
> another cause.
>
> alexander lunyov wrote:
>> Здравствуйте.
>>
>> Имеется freebsd 6.2, postgresql-8.3.1
>>
>> В env:
>>
>> % env | grep UTF
>> LANG=ru_RU.UTF-8
>> MM_CHARSET=UTF-8
>>
>> % psql ports -U pgsql
>> Welcome to psql 8.3.1, the PostgreSQL interactive terminal.
>>
>> Type: \copyright for distribution terms
>> \h for help with SQL commands
>> \? for help with psql commands
>> \g or terminate with semicolon to execute query
>> \q to quit
>>
>> ports=# \encoding
>> UTF8
>> ports=# \l
>> Список баз данных
>> Имя | Владелец | Кодировка
>> -----------+----------+-----------
>> ports | pgsql | UTF8
>> postgres | pgsql | UTF8
>> template0 | pgsql | UTF8
>> template1 | pgsql | UTF8
>> (4 rows)
>>
>> Пробую поискать в таблице, и вот результат:
>>
>> ports=# select name from abonents where to_tsvector(name) @@
>> to_tsquery('s');
>> ERROR: неверная последовательность байт имя кодировки "UTF8": 0xd1
>> ПОДСКАЗКА: This error can also happen if the byte sequence does not
>> match the encoding expected by the server, which is controlled by
>> "client_encoding".
>>
>> при этом в конфигурации english работает нормально.
>>
>> # select count(name) from abonents where to_tsvector('english',name)
>> @@ to_tsquery('some');
>> count
>> -------
>> 6
>> (1 запись)
>>
>> Почему?
>>
>
>

--
С уважением
Александр Лунев
ОАО РТК

In response to

Responses

Browse pgsql-ru-general by date

  From Date Subject
Next Message Сергей Бурладя =?utf-8?B?0L0=?= 2009-06-03 23:55:14 Re: full text search, utf8
Previous Message alexander lunyov 2009-06-03 09:29:27 full text search, utf8