Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Date: 2016-03-22 21:09:52
Message-ID: CAM3SWZTums8PyGAHUpj=3gpt-xiGbbLqUun3mKK8-aTh-z9R2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> Can you look at generating a textual representation of the strxfrm()
> blobs in question, using Robert's tool?:
>
> http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

I played with this tool myself, on an affected CentOS 6.7 VM:

[vagrant(at)localhost ~]$ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

I now think that we have this backwards: This isn't a bug in glibc's
strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with
modified tool, simplified to use ascii-safe strings:

[vagrant(at)localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6

If we assume for the sake of argument that this is a strxfrm() bug and
strcoll() is a reliable source of truth, then I find it very curious
that Germany's Austrian neighbors differ on this point about how text
should be collated:

[vagrant(at)localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1

This surely adds doubt to the idea that strxfrm() in particular is broken.

I find something else inconsistent with the strxfrm() theory: even the
de_DE collation gives strxfrm()/strcoll() self-consistent answers when
we move the rhs argument's space to the far side of its center 'x'
char:

[vagrant(at)localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x'
"xxx" -> 2323230108080801020202 (11 bytes)
"xx x" -> 2323230108080801020202010335 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1

It seems very unlikely that this is because of a legitimate
consideration that strcoll() makes about how German should be collated
(one that strxfrm() fails to make, say).

This is probably a worse situation for affected Postgres systems,
though, because now they have no scope to turn the faulty part of the
system off. I have a hard time believing that it's a good idea to
trust strcoll() to be wrong in a consistent way that has collatable
type opclasses at least follow "Notes to Operator Class Implementors".
I'd like to hear more opinions on that, though, because it's a tricky
thing to reason about.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-03-22 21:57:22 Re: BUG #14034: Select for update with inner select doesn't return value after committing by other transaction.
Previous Message Alvaro Herrera 2016-03-22 21:07:56 Re: problem in sql - sum()

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-22 21:10:11 Re: WIP: Access method extendability
Previous Message Yury Zhuravlev 2016-03-22 21:04:49 Re: NOT EXIST for PREPARE