Upgrading pg_statistic to handle collation honestly

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Upgrading pg_statistic to handle collation honestly
Date: 2018-12-12 15:57:07
Message-ID: 14706.1544630227@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When we first put in collations support, we basically punted on teaching
ANALYZE, pg_statistic, and the planner selectivity functions about that.
They just use DEFAULT_COLLATION_OID independently of the actual collation
of the data. I tripped over this while investigating making type "name"
collatable: it needs to default to C_COLLATION_OID, and the mismatch
resulted in broken statistics for name columns. So it's time to pay down
that technical debt.

Attached is a draft patch for same. It adds storage to pg_statistic
to record the collation of each statistics "slot". A plausible
alternative design would be to just say "look at the collation of the
underlying column", but that would require extra catcache lookups in
the selectivity functions that need the info. Doing it like this also
makes it theoretically feasible to track stats computed with respect
to different collations for the same column, though I'm not really
convinced that we'd ever do that.

Loose ends:

* I'm not sure what, if anything, needs to be done in the extended
statistics stuff. It looks like the existing types of extended stats
aren't really collation sensitive, so maybe the answer is "nothing".

* There's a remaining use of DEFAULT_COLLATION_OID in array_selfuncs.c's
element_compare(). I'm not sure if it's important to get rid of that,
either; it doesn't seem to be used for anything that relates to
collected statistics, so it might be fine as-is.

* Probably this conflicts to some extent with Peter's "Reorganize
collation lookup" patch, but I haven't studied that yet.

* There's a kluge in get_attstatsslot() that I'd like to get rid of
later, but it's necessary for now because of the weird things that
happen when doing regex operators on "name" columns.

Comments, objections?

regards, tom lane

Attachment Content-Type Size
collatable-statistics-1.patch text/x-diff 35.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-12-12 17:21:29 Re: Cache lookup errors with functions manipulation object addresses
Previous Message Sergei Kornilov 2018-12-12 15:55:04 Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name