Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Windows buildfarm animals are still not happy with abbreviated keys patch
Date: 2015-01-23 17:34:15
Message-ID: CA+TgmoZmQk3XDfPEiTOPQwiJLvECmMxOL9tPeSJ2WJZ7O-ujxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 22, 2015 at 5:51 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> That having been said, it's clearer to continue to handle each case (C
> locale vs other locales) separately within the new
> bttext_abbrev_convert() function, just to be consistent, but also to
> avoid NUL-terminating the text strings to pass to strxfrm(), which as
> you point out is an avoidable cost. So, I'm not asking you to restore
> the terser uniform use of strxfrm() we had before, but, for what it's
> worth, that was not the real issue. The real issue was that strxfrm()
> spuriously used the wrong locale, as you said. Provided strxfrm() had
> the correct locale (the C locale), then AFAICT it ought to have been
> fine, regardless of whether or not it then behave exactly equivalently
> to memcpy().

I don't agree. On a system where HAVE_LOCALE_T is not defined, there
is *no way* to get strxfrm() to behave like memcpy(), because we're
not passing a locale to it. Clearly, strxfrm() is going to do a
locale-aware transformation in that case whether the user wrote
collate "C" or not. The comments in regc_pg_locale.c explain:

* 1. In C/POSIX collations, we use hard-wired code. We can't depend on
* the <ctype.h> functions since those will obey LC_CTYPE. Note that these
* collations don't give a fig about multibyte characters.
*
* 2. In the "default" collation (which is supposed to obey LC_CTYPE):
*
* 2a. When working in UTF8 encoding, we use the <wctype.h> functions if
* available. This assumes that every platform uses Unicode codepoints
* directly as the wchar_t representation of Unicode. On some platforms
* wchar_t is only 16 bits wide, so we have to punt for codepoints > 0xFFFF.
*
* 2b. In all other encodings, or on machines that lack <wctype.h>, we use
* the <ctype.h> functions for pg_wchar values up to 255, and punt for values
* above that. This is only 100% correct in single-byte encodings such as
* LATINn. However, non-Unicode multibyte encodings are mostly Far Eastern
* character sets for which the properties being tested here aren't very
* relevant for higher code values anyway. The difficulty with using the
* <wctype.h> functions with non-Unicode multibyte encodings is that we can
* have no certainty that the platform's wchar_t representation matches
* what we do in pg_wchar conversions.
*
* 3. Other collations are only supported on platforms that HAVE_LOCALE_T.
* Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h>
* functions, under exactly the same cases as #2.

In other words, even on systems that don't HAVE_LOCALE_T, we still
have to support the default collation and the C collation, and they
have to behave differently. There's no way to make that work using
only strxfrm(), because nothing gets passed to that function to tell
it which of those two things it is supposed to do.

Now even if the above were not an issue, for example because we
dropped support for systems where !HAVE_LOCALE_T, I think it would be
poor form to depend on strxfrm_l() to behave like memcpy() where we
could just as easily call memcpy() and be *sure* that it was going to
do what we wanted. Much of writing good code is figuring out what
could go wrong and then figuring out how to prevent it, and relying on
strxfrm_l() would be an unnecessary risk regardless. Given the
existence of !HAVE_LOCALE_T systems, it's just plain broken.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-01-23 18:07:44 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Previous Message Robert Haas 2015-01-23 17:18:49 Re: pgaudit - an auditing extension for PostgreSQL