Re: Why do indexes and sorts use the database collation?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why do indexes and sorts use the database collation?
Date: 2023-11-17 19:39:40
Message-ID: 13fb2032cbfda29bb4c6c55ccc41a3d01a8de717.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2023-11-13 at 14:12 -0800, Andres Freund wrote:
> Why on earth are we solving this by having multiple pg_collation
> entries for
> exactly the same collation, instead of normalizing the collation-name
> during
> lookup by adding the relevant encoding name if not explicitly
> specified?  It
> makes a lot of sense to not force the user to specify the encoding
> when it
> can't differ.

I'm not aware of it being a common practical problem, so perhaps lack
of motivation. But you're right that it doesn't look very efficient.

We can even go deeper into ICU if we wanted to: lots of locales are
actually aliases to a much smaller number of actual collators. And a
lot are just aliases to the root locale. It's not trivial to reliably
tell if two collators are identical, but in principle it should be
possible: each collation is just a set of tailorings on top of the root
locale, so I suppose if those are equal it's the same collator, right?

> It's imo similarly absurd that an index with "default" collation
> cannot be
> used when specifying the equivalent collation explicitly in the query
> and vice
> versa.

The catalog representation is not ideal to treat the database collation
consistently with other collations. It would be nice to fix that.

> > > >
> Jeff was saying that textual primary keys typically don't need
> sorting and
> because of that we could default to "C", for performance. Part of my
> response
> was that I think the user's intent could be expressed by specifying
> the column
> collation as "C" - to which Jeff replied that that would change the
> semantics. Which, to me, seems to completely run counter to his
> argument that
> we could just use "C" for such indexes.

I am saying we shouldn't prematurely optimize for the case of ORDER BY
on a text PK case by making a an index with a non-"C" collation, given
the costs and risks of non-"C" indexes. Particularly because, even if
there is an ORDER BY, there are several common reasons such an index
would not help anyway.

> > > >
Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-11-17 19:42:00 Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
Previous Message Dmitry Dolgov 2023-11-17 19:17:32 Re: Schema variables - new implementation for Postgres 15