Re: Collation version tracking for macOS

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Jeremy Schneider <schneider(at)ardentperf(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Jim Nasby <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-06-09 18:22:37
Message-ID: CAH2-Wz=Oa5P586u14zkE-PUhyO4XWfi60J4JTkex615pA2eg_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 9, 2022 at 10:54 AM Jeremy Schneider
<schneider(at)ardentperf(dot)com> wrote:
> I’m probably just going to end up rehashing the old threads I haven’t read yet…
>
> One challenge with this approach is you have things like sort-merge joins that require the same collation across multiple objects. So I think you’d need to keep all the old indexes around until you have new indexes available for all objects in a database, and somehow the planner would need to be smart enough to dynamically figure out old vs new versions on a query-by-query basis.

I don't think that it would be fundamentally difficult to have the
planner deal with collations at the level required to avoid incorrect
query plans.

I'm not suggesting that this is an easy project, or that the end
result would be totally free of caveats, such as the issue with merge
joins. I am only suggesting that something like this seems doable.
There aren't that many distinct high level approaches that could
possibly decouple upgrading Postgres/the OS from reindexing. This is
one.

> And my opinion is that the problems caused by depending on OS libraries for collation need to be addressed on a shorter timeline than what’s realistic for inventing a new way for a relational database to offer transparent or online upgrades of linguistic collation versions.

But what does that really mean? You can use ICU collations as the
default for the entire cluster now. Where do we still fall short? Do
you mean that there is still a question of actively encouraging using
ICU collations?

I don't understand what you're arguing for. Literally everybody agrees
that the current status quo is not good. That much seems settled to
me.

> Also I still think folks are overcomplicating this by focusing on linguistic collation as the solution.

I don't think that's true; I think that everybody understands that
being on the latest linguistic collation is only very rarely a
compelling feature. The whole way that BCP47 tags are so forgiving is
entirely consistent with that view of things.

But what difference does it make? As long as you accept that any
collation *might* need to be updated, or the default ICU version might
change on OS upgrade, then you have to have some strategy for dealing
with the transition. Not being on a very old obsolete version of ICU
will eventually become a "compelling feature" in its own right.

I believe that EDB adopted ICU many years ago, and stuck with one
vendored version for quite a few years. And eventually being on a very
old version of ICU became a real problem.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ma, Marcus 2022-06-09 18:36:21 Sharing DSA pointer between parallel workers after they've been created
Previous Message Soumyadeep Chakraborty 2022-06-09 18:21:58 Re: ALTER TABLE SET ACCESS METHOD on partitioned tables