Re: Possible to create a hidden collation

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible to create a hidden collation
Date: 2023-05-13 12:28:59
Message-ID: 226e3a10-41c6-4150-a31e-e806d7c0efd4@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Jeff Davis wrote:

> create collation test(provider=icu,
> deterministic=false, locale='@colStrength=primary');
> create collation "en_US"(provider=icu,
> deterministic=false, locale='@colStrength=primary');
>
> select 'a' = 'A' collate test; -- true
> select 'a' = 'A' collate "en_US"; -- false
> drop collation "en_US"; -- drops built-in collation
> select 'a' = 'A' collate "en_US"; -- true
>
> Explanation:
>
> The second collation named "en_US" is hidden behind the built-in
> collation "en_US" because the former is created with collencoding=-1
> (as all icu collations are), and the latter is a libc collation with
> collencoding equal to the current database encoding (which takes
> precedence).
>
> It's a minor bug, but could be surprising behavior.

ISTM that this behavior is not due to collencoding=-1, but
to the custom "en_US" collation being in the "public" schema
whereas the built-in "en_US" is in "pg_catalog".

Assuming a default search_path,
- create collation "en_US"(...) means
create collation "public"."en_US"(...)
- select 'a' = 'A' collate "en_US" means
select 'a' = 'A' collate "pg_catalog"."en_US"
- drop collation "en_US" means
drop collation "pg_catalog"."en_US"

So in practice the new collation en_US is not being seen
until the system collation is dropped, independently of
collencoding.

Also, the proposed patch doesn't appear to change the outcome
of the sequence of statements given as an example, which is both
expected considering the above, and surprising because you imply
that it should improve the user experience.

What I'm seeing after applying it:

Initial state:
postgres=# select * from pg_collation where collname='en_US' \gx
-[ RECORD 1 ]-------+-----------
oid | 12419
collname | en_US
collnamespace | 11
collowner | 10
collprovider | c
collisdeterministic | t
collencoding | 6
collcollate | en_US.utf8
collctype | en_US.utf8
colliculocale |
collicurules |
collversion | 2.35

postgres=# create collation "en_US"(provider=icu,
postgres(# deterministic=false, locale='@colStrength=primary');
NOTICE: using standard form "und-u-ks-level1" for locale
"@colStrength=primary"
CREATE COLLATION

postgres=# select * from pg_collation where collname='en_US' \gx
-[ RECORD 1 ]-------+----------------
oid | 12419
collname | en_US
collnamespace | 11
collowner | 10
collprovider | c
collisdeterministic | t
collencoding | 6
collcollate | en_US.utf8
collctype | en_US.utf8
colliculocale |
collicurules |
collversion | 2.35
-[ RECORD 2 ]-------+----------------
oid | 16388
collname | en_US
collnamespace | 2200
collowner | 10
collprovider | i
collisdeterministic | f
collencoding | 6
collcollate |
collctype |
colliculocale | und-u-ks-level1
collicurules |
collversion | 153.112

(I notice collencoding=6 for the new collation in the public namespace)

postgres=# select 'a' = 'A' collate "en_US";
?column?
----------
f
(1 row)

(the libc collation is still used, because pg_catalog comes first)

postgres=# drop collation "en_US";
DROP COLLATION

postgres=# select * from pg_collation where collname='en_US' \gx
-[ RECORD 1 ]-------+----------------
oid | 16388
collname | en_US
collnamespace | 2200
collowner | 10
collprovider | i
collisdeterministic | f
collencoding | 6
collcollate |
collctype |
colliculocale | und-u-ks-level1
collicurules |
collversion | 153.112

(the collation in pg_catalog has been dropped, so we're left with only
the custom collation).

So maybe it's better to set collencoding to the db encoding as done
by the patch, but it's not clear what concrete problem it solves.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2023-05-15 00:41:14 Re: BUG #17884: gist_page_items() crashes for a non-leaf page of an index with non-key columns
Previous Message Nikolas Mauropoulos 2023-05-13 09:13:45 Re: BUG #17929: LLVMBuildGEP: symbol not found