Reorganize collation lookup time and place

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Reorganize collation lookup time and place
Date: 2018-12-10 15:12:54
Message-ID: 559d6b57-bf1e-6a34-5e30-9f3f4bf5ba1c@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a patch that reorganizes how and where collations are looked up.

Until now, a fmgr C function that makes use of collation information
(for example varstr_cmp(), str_toupper()) gets passed the collation OID,
looks up the collation with pg_newlocale_from_collation(), and then does
something with it.

With this change, the lookup is moved earlier, typically during executor
initialization. The fmgr functions receive the locale pointer that is
the result of that lookup.

This makes the collation lookup more similar to the function lookup. In
many cases they now happen right next to each other.
pg_newlocale_from_collation() becomes analogous to fmgr_info() and
pg_locale_t to FmgrInfo *.

We also save repeated lookups in repeated function calls. I don't
expect massive performance gains, since it's all cached either way, but
it just makes more sense to look up the collation once per executor node
instead of once per function call.

Moreover, this makes the structure and logic of functions like
varstr_cmp() easier, because they don't have to deal with a mix of OIDs
and locale pointers that are sometimes not set depending on the value of
the other. A locale pointer of zero now has the more natural meaning of
"nothing was provided" instead of a mix of nothing or default.
Conversely, if a collation was assigned for the call, the passed-in
pointer is always non-zero.

I think this would also help other efforts like allowing ICU to be used
for the default collation. For that, we need to store the ICU collators
somewhere, so having NULL represent the default locale won't do anymore.

Some renaming might be in order. The original collation implementation
used pg_newlocale_... and pg_locale_t as thin wrappers around the
operating system's newlocale() and locale_t. Now these have grown into
larger structures with various catalog lookup information. It might be
worth renaming, for example pg_locale_t to CollInfo *. (It's also
confusing in this context that pg_locale_t is already a pointer but
FmgrInfo is a struct.)

Also, in order to avoid including pg_locale.h in a lot more places, I
have created an incomplete type fmLocalePtr in fmgr.h. This isn't
necessarily the final solution, given the discussion in the previous
paragraph. Maybe a better structure would be to have, say, CollInfo *
and whatever the lookup function is called in fmgr.h and fmgr.c, and
then only have fmgr.c include pg_locale.h and call into pg_locale.c. Or
something similar.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v1-0001-Reorganize-collation-lookup-time-and-place.patch text/plain 211.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2018-12-10 15:21:23 Add timeline to partial WAL segments
Previous Message John Naylor 2018-12-10 14:55:11 Re: docs: outdated reference to recursive expression evaluation