Re: BUG #16926: initdb fails on Windows when binary path contains certain non-ASCII characters

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ebakke(at)ultorg(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16926: initdb fails on Windows when binary path contains certain non-ASCII characters
Date: 2021-03-14 22:38:47
Message-ID: 2932627.1615761527@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> On PostgreSQL 13.1 on US English Windows 10, "initdb" will fail with the
> following error if the initdb.exe executable is located on a path that
> contains certain non-ASCII characters, and "--encoding=UTF8" is specified.
> In the following example, I am executing initdb.exe from a folder called
> "C:\Users\ebakke\ZRoot\PostgresTest\FolderÆØÅ\pgsql\bin":

I'm afraid this is largely a case of "Doctor, it hurts when I do this!"
... "So don't do that." Although we could possibly fix initdb to not
fail under these circumstances, I think you'd have continuing pain from
not being able to represent the installation path in the database
encoding. References to, for example, the script files for standard
extensions would be impossible to write in SQL.

I was curious enough to dig for exactly where initdb has a problem,
and I found that it's where it generates a COPY command to populate
information_schema.sql_features from .../share/sql_features.txt in
the installation file tree. The backend's expecting the COPY to be
entirely valid UTF8 text, but the pathname string won't be.

I thought of using COPY FROM STDIN, but that doesn't work in a
standalone backend, and I doubt we want to expend the effort to
make it do so. Or we could have initdb convert the data to a large
INSERT...VALUES command. But on the whole, given the likely follow-on
issues people would have with this sort of situation, it doesn't seem
worth putting effort into fixing just this particular place in initdb.

Thinking about bigger-picture solutions, the first idea that comes
to mind is that maybe file paths ought to be treated as bytea rather
than text, since we have no good reason to expect that they are in
any particular encoding. But making that happen would be a research
project, and it'd likely result in some unpleasant compatibility
breakage.

In short, this doesn't seem likely to get improved any time soon.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Eirik Bakke 2021-03-15 00:09:45 RE: BUG #16926: initdb fails on Windows when binary path contains certain non-ASCII characters
Previous Message Tom Lane 2021-03-14 17:58:51 Re: BUG #16920: Can't compile PostGIS with MingW64 against PostgreSQL 14 head