Re: ICU for global collation

From: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Daniel Verite <daniel(at)manitou-mail(dot)org>, pryzby(at)telsasoft(dot)com
Subject: Re: ICU for global collation
Date: 2022-08-15 12:06:32
Message-ID: f385ba25e7f8be427b8c582e5cca7d79@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello everyone in this thread!

While reading and testing the patch that adds ICU for global collations
[1] I noticed on master (1c5818b9c68e5c2ac8f19d372f24cce409de1a26) and
REL_15_STABLE (63b64d8270691894a9a8f2d4e929e7780020edb8) that:

1) pg_upgrade from REL_14_STABLE
(63b64d8270691894a9a8f2d4e929e7780020edb8) does not always work:

For REL_14_STABLE:

$ initdb -D data_old

For REL_15_STABLE or master:

$ initdb -D data_new --locale-provider icu --icu-locale ru-RU
$ pg_upgrade -d .../data_old -D data_new -b ... -B ...
...
Restoring database schemas in the new cluster
template1
*failure*

Consult the last few lines of
"data_new/pg_upgrade_output.d/20220815T142454.223/log/pg_upgrade_dump_1.log"
for
the probable cause of the failure.
Failure, exiting

In
data_new/pg_upgrade_output.d/20220815T142454.223/log/pg_upgrade_dump_1.log:

pg_restore: error: could not execute query: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Command was: CREATE DATABASE "template1" WITH TEMPLATE = template0 OID =
1 ENCODING = 'UTF8' LOCALE_PROVIDER = libc LOCALE = 'en_US.UTF-8';

In
data_new/pg_upgrade_output.d/20220815T142454.223/log/pg_upgrade_server.log:

TRAP: FailedAssertion("(dblocprovider == COLLPROVIDER_ICU &&
dbiculocale) || (dblocprovider != COLLPROVIDER_ICU && !dbiculocale)",
File: "dbcommands.c", Line: 1292, PID: 69247)
postgres: marina postgres [local] CREATE
DATABASE(ExceptionalCondition+0xb9)[0xb4d8ec]
postgres: marina postgres [local] CREATE
DATABASE(createdb+0x1abc)[0x68ca99]
postgres: marina postgres [local] CREATE
DATABASE(standard_ProcessUtility+0x651)[0x9b1d82]
postgres: marina postgres [local] CREATE
DATABASE(ProcessUtility+0x122)[0x9b172a]
postgres: marina postgres [local] CREATE DATABASE[0x9b01cf]
postgres: marina postgres [local] CREATE DATABASE[0x9b0433]
postgres: marina postgres [local] CREATE
DATABASE(PortalRun+0x2fe)[0x9af95d]
postgres: marina postgres [local] CREATE DATABASE[0x9a953b]
postgres: marina postgres [local] CREATE
DATABASE(PostgresMain+0x733)[0x9ada6b]
postgres: marina postgres [local] CREATE DATABASE[0x8ec632]
postgres: marina postgres [local] CREATE DATABASE[0x8ebfbb]
postgres: marina postgres [local] CREATE DATABASE[0x8e8653]
postgres: marina postgres [local] CREATE
DATABASE(PostmasterMain+0x1226)[0x8e7f26]
postgres: marina postgres [local] CREATE DATABASE[0x7bbccb]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7eff082f90b3]
postgres: marina postgres [local] CREATE DATABASE(_start+0x2e)[0x49c29e]
2022-08-15 14:24:56.124 MSK [69231] LOG: server process (PID 69247) was
terminated by signal 6: Aborted
2022-08-15 14:24:56.124 MSK [69231] DETAIL: Failed process was running:
CREATE DATABASE "template1" WITH TEMPLATE = template0 OID = 1 ENCODING =
'UTF8' LOCALE_PROVIDER = libc LOCALE = 'en_US.UTF-8';

1.1) It looks like there's a bug in the function get_db_infos
(src/bin/pg_upgrade/info.c), where the version of the old cluster is
always checked:

if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"'c' AS datlocprovider, NULL AS daticulocale, ");
else
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"datlocprovider, daticulocale, ");

With the simple patch

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index
df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87
100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster)

snprintf(query, sizeof(query),
"SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
- if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
+ if (GET_MAJOR_VERSION(cluster->major_version) < 1500)
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"'c' AS datlocprovider, NULL AS daticulocale, ");
else

I got the expected error during the upgrade:

locale providers for database "template1" do not match: old "libc", new
"icu"
Failure, exiting

1.2) It looks like the mentioned asserion in dbcommands.c conflicts with
the following lines earlier:

if (dbiculocale == NULL)
dbiculocale = src_iculocale;

The following patch works for me:

diff --git a/src/backend/commands/dbcommands.c
b/src/backend/commands/dbcommands.c
index
b31a30550b025d48ba3cc250dc4c15f41f9a80be..17a2942341e528c01182fb6d4878580f2706bec9
100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1048,6 +1048,8 @@ createdb(ParseState *pstate, const CreatedbStmt
*stmt)
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("ICU locale must be specified")));
}
+ else
+ dbiculocale = NULL;

if (dblocprovider == COLLPROVIDER_ICU)
check_icu_locale(dbiculocale);

2) CREATE DATABASE does not always require the icu locale unlike initdb
and createdb:

$ initdb -D data --locale en_US.UTF-8 --locale-provider icu
...
initdb: error: ICU locale must be specified

$ initdb -D data --locale en_US.UTF-8
$ pg_ctl -D data -l logfile start

$ createdb mydb --locale en_US.UTF-8 --template template0
--locale-provider icu
createdb: error: database creation failed: ERROR: ICU locale must be
specified

$ psql -c "CREATE DATABASE mydb LOCALE \"en_US.UTF-8\" TEMPLATE
template0 LOCALE_PROVIDER icu" postgres
CREATE DATABASE

$ psql -c "CREATE DATABASE mydb TEMPLATE template0 LOCALE_PROVIDER icu"
postgres
ERROR: ICU locale must be specified

I'm wondering if this is not a fully-supported feature (because createdb
creates an SQL command with LC_COLLATE and LC_CTYPE options instead of
LOCALE option) or is it a bug in CREATE DATABASE?.. From
src/backend/commands/dbcommands.c:

if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
{
if (dlocale && dlocale->arg)
dbiculocale = defGetString(dlocale);
}

[1]
https://github.com/postgres/postgres/commit/f2553d43060edb210b36c63187d52a632448e1d2

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2022-08-15 12:29:10 Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Previous Message Melih Mutlu 2022-08-15 11:26:26 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication