Mac OS: invalid byte sequence for encoding "UTF8"

From: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Mac OS: invalid byte sequence for encoding "UTF8"
Date: 2016-01-27 09:59:41
Message-ID: 56A8950D.9080902@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

When a user try to create a text search dictionary for the russian
language on Mac OS then called the following error message:

CREATE EXTENSION hunspell_ru_ru;
+ ERROR: invalid byte sequence for encoding "UTF8": 0xd1
+ CONTEXT: line 341 of configuration file
"/Users/stas/code/postgrespro2/tmp_install/Users/stas/code/postgrespro2/install/share/tsearch_data/ru_ru.affix":
"SFX Y хаться шутся хаться

Russian dictionary was downloaded from
http://extensions.openoffice.org/en/project/slovari-dlya-russkogo-yazyka-dictionaries-russian
Affix and dictionary files was extracted from the archive and converted
to UTF-8. Also a converted dictionary can be downloaded from
https://github.com/select-artur/hunspell_dicts/tree/master/ru_ru

This behavior occurs on:
- Mac OS X 10.10 Yosemite and Mac OS X 10.11 El Capitan.
- latest PostgreSQL version from git and PostgreSQL 9.5 (probably also
on 9.4.5).

There is also the test to reproduce this bug in the attachment.

Did you meet this bug? Do you have a solution or a workaround?

Thanks in advance.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
test.c text/x-csrc 367 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2016-01-27 10:10:43 Re: pgbench stats per script & other stuff
Previous Message Craig Ringer 2016-01-27 09:39:46 Re: WIP: Failover Slots