Quick Links

Re: [HACKERS] Can ICU be used for a database's default sort order?

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Can ICU be used for a database's default sort order?
Date:	2018-09-12 01:06:12
Message-ID:	CAEepm=11rd6jh=oXn+WgumHT_DExE7PkYH4wfViOdUr8n1LEQA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Jun 24, 2017 at 10:55 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Fri, Jun 23, 2017 at 11:32 AM, Peter Eisentraut
> <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> > It's something I hope to address soon.
>
> I hope you do. I think that we'd realize significant benefits by
> having ICU become the defacto standard collation provider, that most
> users get without even realizing it. As things stand, you have to make
> a point of specifying an ICU collation as your per-column collation
> within every CREATE TABLE. That's a significant barrier to adoption.
>
> > 1) Associate by name only. That is, you can create a database with any
> > COLLATION "foo" that you want, and it's only checked when you first
> > connect to or do anything in the database.
> >
> > 2) Create shared collations. Then we'd need a way to manage having a
> > mix of shared and non-shared collations around.
> >
> > There are significant pros and cons to all of these ideas. Some people
> > I talked to appeared to prefer the shared collations approach.
>
> I strongly prefer the second approach. The only downside that occurs
> to me is that that approach requires more code. Is there something
> that I've missed?

Sorry to join this thread late. I was redirected here from another one[1].

I like the shared catalog idea, but here's one objection I thought
about: it makes it a bit harder to track whether you've sorted out all
your indexes after a version change. Say collation fr_CA's version
changes according to the provider, so that it no longer matches the
stored collversion. Now you'll need to be careful to connect to every
database in the cluster and run REINDEX, before you run ALTER
COLLATION "fr_CA" REFRESH VERSION to update the single shared
pg_collation row's collversion. With the non-shared pg_collation
scheme we have currently, you'd need to refresh the collation row in
each database after reindexing the whole database, which is IMHO a bit
nicer (you track which databases you've dealt with as you go through
them).

In other words, using a shared catalog moves the "scope" of the
version tracking even further away from the ideal scope, and requires
humans to actually get the cleanup right, and it's extra confusing
because you can only be connected to one database at a time so there
is no "REINDEX MY CLUSTER" and no possibility of making a command that
reindexes dependent indexes and then refreshes the collation version.

The ideal scope would be to track all referenced collation versions on
every index, and only update them at CREATE INDEX or REINDEX time
(also, as discussed in some other thread, CHECK constraints and
partition keys might be invalidated and should in theory also carry
versions that can only be updated by running a hypothetical RECHECK or
REPARTITION command). Then a shared pg_collation catalog would make
perfect sense, and there would be no need for it to have a collversion
column at all, or an ALTER COLLATION ... REFRESH VERSION command, and
therefore there would be no way to screw it up by REFRESHing the
VERSION without having really fixed the problem.

[1] https://www.postgresql.org/message-id/242e081c-aec8-a20a-510c-f4d0f183cebd%402ndquadrant.com

--
Thomas Munro
http://www.enterprisedb.com

In response to

Re: Can ICU be used for a database's default sort order? at 2017-06-23 22:54:36 from Peter Geoghegan

Responses

Re: [HACKERS] Can ICU be used for a database's default sort order? at 2018-10-02 06:37:06 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2018-09-12 02:25:10	Re: stat() on Windows might cause error if target file is larger than 4GB
Previous Message	Tom Lane	2018-09-11 22:59:55	Re: Retrieve memory size allocated by libpq