Re: Mac OS: invalid byte sequence for encoding "UTF8"

From: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: Stas Kelvich <stas(dot)kelvich(at)gmail(dot)com>
Cc: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Mac OS: invalid byte sequence for encoding "UTF8"
Date: 2016-01-28 14:42:05
Message-ID: 56AA28BD.7080108@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27.01.2016 15:28, Artur Zakirov wrote:
> On 27.01.2016 14:14, Stas Kelvich wrote:
>> Hi.
>>
>> I tried that and confirm strange behaviour. It seems that problem with
>> small cyrillic letter ‘х’. (simplest obscene language filter? =)
>>
>> That can be reproduced with simpler test
>>
>> Stas
>>
>>
>
> The test program was corrected. Now it uses wchar_t type. And it works
> correctly and gives right output.
>
> I think the NIImportOOAffixes() in spell.c should be corrected to avoid
> this bug.
>

I have attached a patch. It adds new functions parse_ooaffentry() and
get_nextentry() and fixes a couple comments.

Now russian and other supported dictionaries can be used for text search
in Mac OS.

parse_ooaffentry() parses an affix file entry instead of sscanf(). It
has a similar algorithm to the parse_affentry() function.

Should I create a new patch to fix this bug (as I did) or this patch
should go with the patch
http://www.postgresql.org/message-id/56AA02EE.6090004@postgrespro.ru ?

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
tsearch_aff_parse_v1.patch text/x-patch 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2016-01-28 14:43:37 Re: New committer
Previous Message Robert Haas 2016-01-28 14:37:42 Re: Template for commit messages