Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, adam(at)labkey(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
Date: 2024-11-28 12:22:49
Message-ID: CA+hUKGKKNAc599Vp7kFAnLE1=V=ceYujz_YQoSNrvNFGaJ6i7w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Nov 28, 2024 at 5:04 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> There is nothing
> about our handling of non-ASCII characters in shared system catalogs
> that isn't squishy as heck, and yet there have been darn few field
> complaints over the many years it's been like that. Maybe trying to
> make this truncation issue better in isolation wasn't such a great
> plan.

I guess most people in Unix-land just use UTF-8 in every layer of
their software stack these days, so don't often see confused encodings
anymore? But I don't think that's true in the other place, where they
still routinely juggle multiple encodings and see garbled junk when it
goes wrong[1]. They might still generally prefer UTF-8 for database
encoding though, IDK.

> (If we recorded the encoding of names in shared catalogs then this
> particular issue would be far easier to solve, but then we have
> other problems to address --- particularly, what to do if a name
> in the catalog fails to convert to the encoding we are using.)

Here is a much dumber coarse-grained way I have wondered about for
making the encoding certain, without having to do any new conversions
at all: (1) single-encoding cluster mode, shared catalogues use same
encoding as all databases, (2) multi-encoding cluster mode with
ASCII-only shared catalogues, and (3) legacy squishy/raw mode you
normally only reach by pg_upgrade. Maybe you could switch between
them with an operation that validates names.

Then I think you could always know the shared cat encoding even with
no database context, and when you are connected to a database you
could mostly just carry on assuming it's database encoding (either it
is, or it's the ASCII subset). That can only be wrong in mode 3, all
bets off just like today, but that's your own fault for using mode 3.

I guess serious users of multi-encoding clusters already learn to
stick to ASCII-only role names and database names anyway, unless they
like seeing garbage?

[1] https://www.postgresql.org/message-id/flat/00a601db3b20%24b00261e0%24100725a0%24%40gmx.net

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Zaid Shabbir 2024-11-28 13:22:02 Re: BUG #18726: Unable to install PostGIS extension due to error:Checksum verification failed for: postgis_3_4_pg12.
Previous Message Vladyslav Hutych 2024-11-28 12:03:56 Dropping partition with CASCADE drops constraints on partitioned table