Re: Improving collation-dependent indexes in system catalogs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: Improving collation-dependent indexes in system catalogs
Date: 2018-12-16 21:01:42
Message-ID: 32248.1544994102@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> I notice that some information_schema view columns end up with C
> collation after this patch, and others remain with default collation.
> Is that sensible? (I think the only two cases where this might matter
> at all are information_schema.parameters.parameter_name,
> information_schema.routines.external_name and
> information_schema.foreign_servers.foreign_server_type.)

Yeah. Looking closer at that, there are no collation-sensitive indexes
in information_schema (if there were, the existing opr_sanity test would
have caught 'em). But there are collation-sensitive table columns, which
do have pg_statistic entries, and those entries are at least nominally
broken by copying them into a database with a different default collation.

We could think of two ways to deal with that. One is to plaster
COLLATE "C" on each textual table column in the information_schema.
A more aggressive approach is to attach COLLATE "C" to each of the
domain types that information_schema defines, which fixes the table
columns a fortiori, and also causes all of the exposed information_schema
view columns to acquire database-independent collations.

I tried both ways, as in the attached patches below (each meant to be
applied on top of my patch upthread), and they both pass check-world.

A possible advantage of the second approach is that it could end up
allowing comparisons on information_schema view columns to be translated
to indexable comparisons on the underlying "name" columns, which would
be a pleasant outcome. On the other hand, people might be annoyed by
the semantics change, if they'd previously been doing that with the
expectation of getting database-collation-based comparisons.

I'm not sure whether the SQL standard says anything that either patch
would be violating. I see that it does say that these domains have
CHARACTER SET SQL_TEXT, and that the collation of that character set
is implementation-defined, so I think we could get away with changing
so far as spec compliance is concerned.

regards, tom lane

Attachment Content-Type Size
info-schema-collations-solution-1.patch text/x-diff 6.9 KB
info-schema-collations-solution-2.patch text/x-diff 3.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-12-16 21:25:38 Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit
Previous Message Andres Freund 2018-12-16 20:57:33 Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit