Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Date: 2016-03-22 22:06:37
Message-ID: CA+TgmoYkvRZ6BfjRj3b68p2iuwXF5auaR2DLkpFaJj+LpZZu0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Tue, Mar 22, 2016 at 5:09 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> Can you look at generating a textual representation of the strxfrm()
>> blobs in question, using Robert's tool?:
>>
>> http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com
>
> I played with this tool myself, on an affected CentOS 6.7 VM:
>
> [vagrant(at)localhost ~]$ ldd --version
> ldd (GNU libc) 2.12
> Copyright (C) 2010 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> Written by Roland McGrath and Ulrich Drepper.
>
> I now think that we have this backwards: This isn't a bug in glibc's
> strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with
> modified tool, simplified to use ascii-safe strings:
>
> [vagrant(at)localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx'
> "xxx" -> 2323230108080801020202 (11 bytes)
> "x xx" -> 2323230108080801020202010235 (14 bytes)
> strcmp(arg1, arg2) result: -1
> strcoll(arg1, arg2) result: 6
>
> If we assume for the sake of argument that this is a strxfrm() bug and
> strcoll() is a reliable source of truth, then I find it very curious
> that Germany's Austrian neighbors differ on this point about how text
> should be collated:
>
> [vagrant(at)localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx'
> "xxx" -> 2323230108080801020202 (11 bytes)
> "x xx" -> 2323230108080801020202010235 (14 bytes)
> strcmp(arg1, arg2) result: -1
> strcoll(arg1, arg2) result: -1
>
> This surely adds doubt to the idea that strxfrm() in particular is broken.
>
> I find something else inconsistent with the strxfrm() theory: even the
> de_DE collation gives strxfrm()/strcoll() self-consistent answers when
> we move the rhs argument's space to the far side of its center 'x'
> char:
>
> [vagrant(at)localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x'
> "xxx" -> 2323230108080801020202 (11 bytes)
> "xx x" -> 2323230108080801020202010335 (14 bytes)
> strcmp(arg1, arg2) result: -1
> strcoll(arg1, arg2) result: -1
>
> It seems very unlikely that this is because of a legitimate
> consideration that strcoll() makes about how German should be collated
> (one that strxfrm() fails to make, say).
>
> This is probably a worse situation for affected Postgres systems,
> though, because now they have no scope to turn the faulty part of the
> system off. I have a hard time believing that it's a good idea to
> trust strcoll() to be wrong in a consistent way that has collatable
> type opclasses at least follow "Notes to Operator Class Implementors".
> I'd like to hear more opinions on that, though, because it's a tricky
> thing to reason about.

Well, if we implement a compatibility GUC that shuts off our
dependency on strxfrm(), people can go back to having 9.5 be no more
broken than 9.4 was. I vote we do that and go home.
Behavior-changing GUCs suck, but it seems clear that Tom is not going
to sit still for any solution that involves blaming the glibc vendor
no matter how well-justified that approach might be; and I don't have
a better idea. I was a little worried that it was too much to hope
for that all libc vendors on earth would ship a strxfrm()
implementation that was actually consistent with strcoll(), and here
we are. It's a good thing that operating systems manage to make
read() and getpid() several orders of magnitude more reliable than
strxfrm() and strcoll(), or we'd probably all be running Windows or
VMS or something now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-03-22 23:19:44 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Previous Message Tom Lane 2016-03-22 21:57:22 Re: BUG #14034: Select for update with inner select doesn't return value after committing by other transaction.

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-03-22 22:07:07 Re: Using quicksort for every external sort run
Previous Message David G. Johnston 2016-03-22 21:42:32 Re: problem with precendence order in JSONB merge operator