Re: Collatability of type "name"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Collatability of type "name"
Date: 2018-12-18 21:55:14
Message-ID: 21253.1545170114@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Dec 10, 2018 at 2:50 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Also, I think that either solution would lead to some subtle changes
>> in semantics. For example, right now if you compare a name column
>> to a text value, you get a text (collation-aware) comparison using
>> the database's default collation. It looks like if name columns
>> are marked with attcollation = 'C', that would win and the comparison
>> would now have 'C' collation unless you explicitly override it with
>> a COLLATE clause. I'm not sure this is a bad thing --- it'd be more
>> likely to match the sort order of the index on the column --- but it
>> could surprise people.

> It's not great to change the semantics of stuff like this, but it
> doesn't sound all that bad.

I had an epiphany after committing 6b0faf723: if we're forcing system
catalog columns to have "C" collation, there's no critical need for
type "name" to do that for itself. We could upgrade "name" to be
collatable with typcollation = DEFAULT_COLLATION_OID, and then its
comparison semantics would be *exactly the same as text*. Only the
physical representation is different.

This should mean that it's semantically trivial to unify the name_ops
opfamily with text_ops (not text_pattern_ops, as I'd previously supposed)
and add all the requisite cross-type operators. I haven't actually
done that yet, but I have made a patch to make "name" fully
collation-aware, as attached.

This approach does have some minuses, though:

* There are assorted user-defined "name" columns in the regression
tests, which may introduce locale dependencies that weren't there
before. I found a couple by running check-world under various locales,
and patched those in the attached, but it's definitely possible that
there are more issues in locales I didn't try.

* If any end users are using columns of type "name", they'd likewise
see behavioral changes, plus their indexes would be broken. We
discourage people from using that type, so I don't think this is a
deal-breaker, but we'd at least have to add intelligence to pg_upgrade
to make it notice user-defined indexes on name columns and arrange
to reindex them.

We could eliminate those two problems if we made "name" have
typcollation "C" rather than "default", so that its semantics
wouldn't change without explicit collation specs. This feels
like pretty much of a wart to me, but maybe it's worth doing
in the name of avoiding compatibility issues. We could still
unify name_ops with text_ops, but now "name" would act more like
a domain with an explicit collation spec.

Thoughts?

regards, tom lane

Attachment Content-Type Size
make-type-name-collatable-1.patch text/x-diff 25.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-12-18 22:02:37 Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
Previous Message Peter Eisentraut 2018-12-18 21:52:50 Re: ExecBuildGroupingEqual versus collations