Re: Collation version tracking for macOS

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation version tracking for macOS
Date: 2022-06-10 04:30:57
Message-ID: CA+hUKGJRaPdb+AKNzKqdgxnaRpW_iVwj529Wrm9h8y__Qj9s+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 10, 2022 at 1:48 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Thu, Jun 9, 2022 at 6:23 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > Well I can report that the system from ec483147 was hellishly
> > complicated, and not universally loved. Which isn't to say that there
> > isn't a simple and loveable way to do it, waiting to be discovered,
> > and I do think we could fix most of the problems with that work.
>
> I admit that I don't have much idea of how difficult it would be to
> make it all work. I'm definitely not claiming that it's easy.

Hrrm... perhaps my memory of ec483147 is confusing me. I think I'm
starting to come around to your idea a bit more now. Let me sketch
out some more details here and see where this goes.

I *was* thinking that you'd have to find all references to collations
through static analysis, as we did in that version tracking project.
But perhaps for this you only need to record one ICU library version
for the whole index at build time, without any analysis at all, and it
would be used for any and all ICU collations that are reached while
evaluating anything to do with that index (index navigation, but also
eg WHERE clause for partial index, etc). That would change to the
"current" value when you REINDEX.

Perhaps that could be modeled with a pg_depend row pointing to a
pg_icu_library row, which you'd probably need anyway, to prevent a
registered ICU library that is needed for a live index from being
dropped. (That's assuming that the pg_icu_library catalogue concept
has legs... well if we're going with dlopen(), we'll need *somewhere*
to store the shared object paths. Perhaps it's not a given that we
really want paths in a table... I guess it might prevent certain
cross-OS streaming rep scenarios, but mostly that'd be solvable with
symlinks...)

One problem is that to drop an old pg_icu_library row, you'd have to
go and REINDEX everything, even indexes that don't really use
collations! If you want to prove that an index doesn't use
collations, you're back in ec483147 territory. Perhaps we don't care
about that and we're happy to let useless dependencies on
pg_icu_library rows accumulate, or to require useless work to be able
to drop them.

I'm not sure how we'd know what the "current" library version is. The
highest numbered one currently in that pg_icu_library catalogue I
sketched? So if I do whatever new DDL we invent to tell the system
about a new ICU library, and it's got a higher number than any others,
new indexes start using it but old ones keep using whatever they're
using. Maybe with some way for users to override it, so users who
really want to use an older one when creating a new index can say so.

I suppose it would be the same for constraints. For those,
considering that they need to be rechecked, the only way to change ICU
version would be to drop the constraint and recreate it. Same goes
for range partitioned tables, right? It'd keep using the old ICU
library until you drop the p table and create a new one, at which
point you're using the new current ICU library and it'll recheck all
your partitions against the constraints when you add them. (Those
constraints are much simpler constants, so for those we could prove no
use of ICU without the general ec483147 beast.)

I think these things would have to survive pg_upgrade, but would be
lost on dump/restore.

There's still the pathkey problem to solve, and maybe some more
problems like that hiding somewhere.

I'm not sold on any particular plan, but working through some examples
helped me see your idea better... I may try to code that up in a
minimal way so we can kick the tyres...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-06-10 04:53:27 Re: Handle infinite recursion in logical replication setup
Previous Message Amit Kapila 2022-06-10 04:24:04 Re: Multi-Master Logical Replication