Re: Bogus collation version recording in recordMultipleDependencies

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Bogus collation version recording in recordMultipleDependencies
Date: 2021-04-16 20:39:39
Message-ID: 40190.1618605579@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I feel like this is telling us that there's a fundamental
> misunderstanding in find_expr_references_walker about which
> collation dependencies to report. It's reporting all the
> leaf-node collations, and ignoring the ones that actually
> count semantically, that is the inputcollid fields of
> function and operator nodes.
> Not sure what's the best thing to do here. Redesigning
> this post-feature-freeze doesn't seem terribly appetizing,
> but on the other hand, this index collation recording
> feature has put a premium on not overstating the collation
> dependencies of an expression. We don't want to tell users
> that an index is broken when it isn't really.

I felt less hesitant to modify find_expr_references_walker's
behavior w.r.t. collations after realizing that most of it
was not of long standing, but came in with 257836a75.
So here's a draft patch that redesigns it as suggested above.
Along the way I discovered that GetTypeCollations was quite
broken for ranges and arrays, so this fixes that too.

Per the changes in collate.icu.utf8.out, this gets rid of
a lot of imaginary collation dependencies, but it also gets
rid of some arguably-real ones. In particular, calls of
record_eq and its siblings will be considered not to have
any collation dependencies, although we know that internally
those will look up per-column collations of their input types.
We could imagine special-casing record_eq etc here, but that
sure seems like a hack.

I"m starting to have a bad feeling about 257836a75 overall.
As I think I've complained before, I do not like anything about
what it's done to pg_depend; it's forcing that relation to serve
two masters, neither one well. We now see that the same remark
applies to find_expr_references(), because the semantics of
"which collations does this expression's behavior depend on" aren't
identical to "which collations need to be recorded as direct
dependencies of this expression", especially not if you'd prefer
to minimize either list. (Which is important.) Moreover, for all
the complexity it's introducing, it's next door to useless for
glibc collations --- we might as well tell people "reindex
everything when your glibc version changes", which could be done
with a heck of a lot less infrastructure. The situation on Windows
looks pretty user-unfriendly as well, per the other thread.

So I wonder if, rather than continuing to pursue this right now,
we shouldn't revert 257836a75 and try again later with a new design
that doesn't try to commandeer the existing dependency infrastructure.
We might have a better idea about what to do on Windows by the time
that's done, too.

regards, tom lane

Attachment Content-Type Size
redesign-collation-dependency-collection-wip.patch text/x-diff 12.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-04-16 20:46:54 Re: Error when defining a set returning function
Previous Message Alvaro Herrera 2021-04-16 20:19:18 Re: default_tablespace doc and partitioned rels