RE: BUG #16926: initdb fails on Windows when binary path contains certain non-ASCII characters

From: Eirik Bakke <ebakke(at)ultorg(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: BUG #16926: initdb fails on Windows when binary path contains certain non-ASCII characters
Date: 2021-03-15 00:09:45
Message-ID: BL0PR20MB2098E0C32AAB397E33840828A16C9@BL0PR20MB2098.namprd20.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks for investigating!

The problem may be unavoidable if the user's home directory name contains special characters. This is common on Windows, where the user name is often the user's real, human name (e.g. "Bjørn Dæhlie").

> I think you'd have continuing pain from not being able to represent the installation path in the database encoding

The installation path can certainly be represented in UTF-8, but a character set conversion is necessary. Wouldn't "SET CLIENT_ENCODING" accomplish this? For example, couldn't initdb in this case do a "SET CLIENT_ENCODING TO 'Windows-1252'" before issuing the COPY command?

On the server side, I'd expect the COPY command to similarly convert the path from the character set used in the client protocol to whichever character set is expected by the file system. But I don't know if this is done...

> the first idea that comes to mind is that maybe file paths ought to be treated as bytea rather than text

Doing the appropriate character set conversions would avoid this--file paths can still be treated as "text". And UTF-8 will happily encode every character of every other supported encoding.

-- Eirik

-----Original Message-----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: Sunday, March 14, 2021 6:39 PM
To: Eirik Bakke <ebakke(at)ultorg(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16926: initdb fails on Windows when binary path contains certain non-ASCII characters

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> On PostgreSQL 13.1 on US English Windows 10, "initdb" will fail with
> the following error if the initdb.exe executable is located on a path
> that contains certain non-ASCII characters, and "--encoding=UTF8" is specified.
> In the following example, I am executing initdb.exe from a folder
> called
> "C:\Users\ebakke\ZRoot\PostgresTest\FolderÆØÅ\pgsql\bin":

I'm afraid this is largely a case of "Doctor, it hurts when I do this!"
... "So don't do that." Although we could possibly fix initdb to not fail under these circumstances, I think you'd have continuing pain from not being able to represent the installation path in the database encoding. References to, for example, the script files for standard extensions would be impossible to write in SQL.

I was curious enough to dig for exactly where initdb has a problem, and I found that it's where it generates a COPY command to populate information_schema.sql_features from .../share/sql_features.txt in the installation file tree. The backend's expecting the COPY to be entirely valid UTF8 text, but the pathname string won't be.

I thought of using COPY FROM STDIN, but that doesn't work in a standalone backend, and I doubt we want to expend the effort to make it do so. Or we could have initdb convert the data to a large INSERT...VALUES command. But on the whole, given the likely follow-on issues people would have with this sort of situation, it doesn't seem worth putting effort into fixing just this particular place in initdb.

Thinking about bigger-picture solutions, the first idea that comes to mind is that maybe file paths ought to be treated as bytea rather than text, since we have no good reason to expect that they are in any particular encoding. But making that happen would be a research project, and it'd likely result in some unpleasant compatibility breakage.

In short, this doesn't seem likely to get improved any time soon.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Regina Obe 2021-03-15 00:31:06 RE: BUG #16920: Can't compile PostGIS with MingW64 against PostgreSQL 14 head
Previous Message Tom Lane 2021-03-14 22:38:47 Re: BUG #16926: initdb fails on Windows when binary path contains certain non-ASCII characters