Re: Mac OS: invalid byte sequence for encoding "UTF8"

From: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stas Kelvich <stas(dot)kelvich(at)gmail(dot)com>, "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Mac OS: invalid byte sequence for encoding "UTF8"
Date: 2016-02-10 13:39:33
Message-ID: 56BB3D95.7030502@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09.02.2016 20:13, Tom Lane wrote:
> I do not like this patch much. It is basically "let's stop using sscanf()
> because it seems to have a bug on one platform". There are at least two
> things wrong with that approach:
>
> 1. By my count there are about 80 uses of *scanf() in our code. Are we
> going to replace every one of them with hand-rolled code? If not, why
> is only this instance vulnerable? How can we know whether future uses
> will have a problem?

It seems that *scanf() with %s format occures only here:
- check.c - get_bin_version()
- server.c - get_major_server_version()
- filemap.c - isRelDataFile()
- pg_backup_directory.c - _LoadBlobs()
- xlog.c - do_pg_stop_backup()
- mac.c - macaddr_in()
I think here sscanf() do not works with the UTF-8 characters. And
probably this is only spell.c issue.

I agree that previous patch is wrong. Instead of using new
parse_ooaffentry() function maybe better to use sscanf() with %ls
format. The %ls format is used to read a wide character string.

>
> 2. We're not being very good citizens of the software universe if we
> just install a hack in Postgres rather than nagging Apple to fix the
> bug at its true source.
>
> I think the appropriate next step to take is to dig into the OS X
> sources (see http://www.opensource.apple.com, I think probably the
> relevant code is in the Libc package) and identify exactly what is
> causing the misbehavior. That would both allow an informed answer
> to point #1 and greatly increase the odds of getting action on a
> bug report to Apple. Even if we end up applying this patch verbatim,
> I think we need that information first.
>
> regards, tom lane
>

I think this is not a bug. It is a normal behavior. In Mac OS sscanf()
with the %s format reads the string one character at a time. The size of
letter 'х' is 2. And sscanf() separate it into two wrong characters.

In conclusion, I think in spell.c should be used sscanf() with %ls format.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-02-10 13:46:07 Re: Updated backup APIs for non-exclusive backups
Previous Message Thom Brown 2016-02-10 13:37:48 Re: Optimization for updating foreign tables in Postgres FDW