Re: Why do indexes and sorts use the database collation?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why do indexes and sorts use the database collation?
Date: 2023-11-14 23:28:24
Message-ID: 6171961c8cc3ebd12d8708a27f5acb13434f9479.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2023-11-14 at 14:47 -0500, Tom Lane wrote:
> Why should that ever be different from the column's own declared
> collation?

Because an index with the "C" collation is more efficient in terms of
building/maintaining/searching the index, and it also doesn't carry
risks of corrupting your PK index when you upgrade libc or other
dependency headaches.

A "C" collation index is also perfectly capable of performing the
duties of a PK index: equality means the exact same thing in every
deterministic collation, so it can enforce the same notion of
uniqueness. It can also be used for ordinary equality lookups in the
same way, though currently our planner doesn't do that (I'll take a
shot at fixing that).

Of course such an index won't offer range scans or pathkeys useful for
ORDER BY on that text column. But as I've argued elsewhere in this
thread, that's less useful than it may seem at first (text indexes are
often uncorrelated). It seems valid to offer this as a trade-off that
users can make.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kanmani Thamizhanban 2023-11-14 23:42:21 Fwd: Issue with launching PGAdmin 4 on Mac OC
Previous Message Peter Smith 2023-11-14 23:10:14 Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)