Re: Multi-byte character case-folding

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>,"Thom Brown" <thom(at)linux(dot)com>,"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-byte character case-folding
Date: 2020-07-07 11:33:16
Message-ID: c1c7e094-b07b-4fa8-84e0-2a1bff1ff456@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:

> CREATE TABLE public."myÉclass" (
> f1 text
> );
>
> If we start to case-fold É, then the only way to access this table will
> be by double-quoting its name, which the application probably is not
> expecting (else it would have double-quoted in the original CREATE TABLE).

This problem already exists when migrating from a mono-byte database
to a multi-byte database, since downcase_identifier() does use
tolower() for mono-byte databases.

db9=# show server_encoding ;
server_encoding
-----------------
LATIN9
(1 row)

db9=# create table MYÉCLASS (f1 text);
CREATE TABLE

db9=# \d
List of relations
Schema | Name | Type | Owner
--------+----------+-------+----------
public | myéclass | table | postgres
(1 row)

db9=# select * from MYÉCLASS;
f1
----
(0 rows)

pg_dump will dump this as

CREATE TABLE public."myéclass" (
f1 text
);

So far so good. But after importing this into an UTF-8 database,
the same "select * from MYÉCLASS" that used to work now fails:

u8=# show server_encoding ;
server_encoding
-----------------
UTF8
(1 row)

u8=# select * from MYÉCLASS;
ERROR: relation "myÉclass" does not exist

The compromise that is mentioned in downcase_identifier() justifying
this inconsistency is not very convincing, because the issues in case
folding due to linguistic differences exist both in mono-byte and
multi-byte encodings. For instance, if it's fine to trust the locale
to downcase 'İ' in a LATIN5 db, it should be okay in a UTF-8 db too.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message mailajaypatel 2020-07-07 11:41:15 Re: Question: PostgreSQL on Amazon linux EC2
Previous Message Pavel Stehule 2020-07-07 10:05:42 Re: [Proposal] Global temporary tables