Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Date: 2016-03-23 15:43:56
Message-ID: 31913.1458747836@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Mar 22, 2016 at 10:44 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> I, too, found MAXXFRMLEN insufficient; I raised it fourfold. Cygwin
>> 2.2.1(0.289/5/3) caught fire; 10% of locales passed. (varstr_sortsupport()
>> already blacklists the UTF8/native Windows case.) The test passed on Solaris
>> 10, Solaris 11, HP-UX B.11.31, OpenBSD 5.0, NetBSD 5.1.2, and FreeBSD 9.0.
>> See attached tryalllocales.sh outputs. I did not test AIX, because the AIX
>> machines I use have no UTF8 locales installed.

> Wow, thanks for the extensive testing. This suggests that, apart from
> Cygwin which apparently doesn't matter right now, the only thing that
> is busted is glibc. I believe we have yet to see a single locale that
> fails anywhere else (apart from Cygwin). Good thing so few of our
> users run glibc!

I extended my test program to be able to check locales using ISO-8859-x
encodings. RHEL6 shows me failures in a set of locales that is remarkably
unlike the set it fails on for UTF8 (though good ol de_DE manages to fail
in both encodings, as do a few others). I'm not sure what that implies
for the underlying bug(s).

> So, options:

> 1. We could make it the user's problem to figure out whether they've
> got a buggy glibc and add a GUC to shut this off, as previously
> suggested.

> 2. We could add a blacklist (either hardcoded or a GUC) shutting this
> off for locales known to be buggy anywhere.

> 3. We could write some test code that runs at startup time which
> reliably detects all of the broken locales we've so far uncovered and
> disables this if so.

> 4. We could shut this off for all Linux users in all locales and tell
> everybody to REINDEX. That would be pretty sad, though.

TBH, I think #1 is right out, unless maybe the GUC defaults to off.
We aren't that cavalier with data consistency in other departments.

#2 and #3 presume a level of knowledge of the bug details that we
have not got, and probably can't get by Monday.

As far as #4 goes, we're going to have to tell people to REINDEX
no matter what the other aspects of the fix look like. On-disk
indexes are broken right now, if you're using one of the affected
locales.

regards, tom lane

Attachment Content-Type Size
strcolltest.c text/x-c 4.8 KB
rhel6-bad-in-utf8 text/plain 628 bytes
rhel6-bad-in-iso8859x text/plain 935 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-03-23 16:13:45 Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Previous Message Robert Haas 2016-03-23 14:47:07 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-03-23 15:54:50 Re: README for src/backend/replication/logical
Previous Message Yury Zhuravlev 2016-03-23 15:34:39 Re: NOT EXIST for PREPARE