Re: Mac OS: invalid byte sequence for encoding "UTF8"

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stas Kelvich <stas(dot)kelvich(at)gmail(dot)com>, "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Mac OS: invalid byte sequence for encoding "UTF8"
Date: 2016-02-10 15:51:32
Message-ID: 56BB5C84.8060106@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> It seems that *scanf() with %s format occures only here:
> - check.c - get_bin_version()
> - server.c - get_major_server_version()
> - filemap.c - isRelDataFile()
> - pg_backup_directory.c - _LoadBlobs()
> - xlog.c - do_pg_stop_backup()
> - mac.c - macaddr_in()
> I think here sscanf() do not works with the UTF-8 characters. And probably this
> is only spell.c issue.

Hmm. Here
src/backend/access/transam/xlog.c read_tablespace_map()
using %s in scanf looks suspisious. I don't fully understand but it looks like
it tries to read oid as string. So, it should be safe in usial case

Next, _LoadBlobs() reads filename (fname) with a help of sscanf. Could file name
be in UTF-8 encoding here?

>
> I agree that previous patch is wrong. Instead of using new parse_ooaffentry()
> function maybe better to use sscanf() with %ls format. The %ls format is used to
> read a wide character string.
Does %ls modifier exist everewhere?
Apple docs says
(https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/sscanf.3.html):
s ...
If an l qualifier is present, the next pointer must be a pointer to wchar_t,
into which the input will be placed after conversion by mbrtowc

Actually, it means that wchar2char() call should be used, but it uses
wcstombs[_l] which could do not present on some platforms. Does it mean that
l modifier of string presents too or not? What do we need to do if %l exists but
wcstombs[_l] not?

I'm a bit crazy with locale problems and it seems to me that Artur's patch is
good idea. Actually, I don't remember exactly, but, seems, commit
7ac8a4be8946c11d5a6bf91bb971b9750c1c60e5 introduced parse_affentry() instead of
corresponding sscanf to avoid problems with encoding and scanf.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-02-10 15:55:10 Re: Tracing down buildfarm "postmaster does not shut down" failures
Previous Message Magnus Hagander 2016-02-10 15:50:26 Re: Updated backup APIs for non-exclusive backups