RE: BUG #16926: initdb fails on Windows when binary path contains certain non-ASCII characters

From: Eirik Bakke <ebakke(at)ultorg(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: BUG #16926: initdb fails on Windows when binary path contains certain non-ASCII characters
Date: 2021-03-15 17:14:06
Message-ID: BL0PR20MB209839081CA5FB938005D44CA16C9@BL0PR20MB2098.namprd20.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> The issues are (1) how should initdb know that the path name ought to be taken as WIN1252, rather than some other encoding?

Researching this some more, on Windows, the old codepage-dependent APIs are essentially deprecated, in favor of functions that pass two-byte unicode strings (wchar_t/LPWSTR). So instead of getting the binary path location from argv[0], one would use GetCommandLineW and CommandLineToArgvW, or more directly, GetModuleFileNameW. That way one never has to guess or detect which encoding is being used.

https://docs.microsoft.com/en-us/windows/win32/api/processenv/nf-processenv-getcommandlinew
https://docs.microsoft.com/en-us/windows/win32/api/shellapi/nf-shellapi-commandlinetoargvw
https://docs.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-getmodulefilenamew

> And then (2) how should the backend know that it must convert the path-name-represented-in-UTF8 to WIN1252 before passing it to the file system?

Similarly, on Windows one is expected to use the wchar_t version of the relevant file system calls, e.g. _wfopen instead of fopen.

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen

> But it's a generic problem, and we're unlikely to be interested in building a single-platform fix.
Alas! I'd imagine there would be some "#ifdef WIN32". But thanks for responding.

-- Eirik

-----Original Message-----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: Monday, March 15, 2021 12:06 PM
To: Eirik Bakke <ebakke(at)ultorg(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16926: initdb fails on Windows when binary path contains certain non-ASCII characters

Eirik Bakke <ebakke(at)ultorg(dot)com> writes:
> The installation path can certainly be represented in UTF-8, but a character set conversion is necessary. Wouldn't "SET CLIENT_ENCODING" accomplish this? For example, couldn't initdb in this case do a "SET CLIENT_ENCODING TO 'Windows-1252'" before issuing the COPY command?

The issues are (1) how should initdb know that the path name ought to be taken as WIN1252, rather than some other encoding? And then (2) how should the backend know that it must convert the path-name-represented-
in-UTF8 to WIN1252 before passing it to the file system? The lack of any standardization about what encoding file names are in is the core of the difficulty.

I don't know much about Windows, so it's possible that there actually are platform-specific ways to answer these questions. But it's a generic problem, and we're unlikely to be interested in building a single-platform fix.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2021-03-15 17:35:12 Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch
Previous Message Tom Lane 2021-03-15 16:12:59 Re: BUG #16920: Can't compile PostGIS with MingW64 against PostgreSQL 14 head