Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Date: 2016-03-22 04:10:52
Message-ID: 19477.1458619852@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> At one point, Robert wrote a small self-contained tool to show OS
> strxfrm() blobs:
> http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

> It would be great if you showed us the output for your test case
> strings, both on an affected and on an unaffected system.

On RHEL6, I get

./strxfrm-binary de_DE.UTF-8 'eai' 'e a'
"eai" -> 100c140108080801020202 (11 bytes)
"e a" -> 100c140108080901020202010235 (14 bytes)

This seems a bit problematic, because these string sort in the other
order ("e a" before "eai") according to sort(1) as well as Postgres
sorting code.

It's possible I've copied-and-pasted these multibyte characters wrong.
But if I haven't, this says that the strxfrm-based optimization is
unusably broken on a very large fraction of reasonably-modern
installations. Quite aside from casting aspersions on the glibc guys,
how did we fail to notice this in our own testing?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2016-03-22 05:16:34 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Previous Message Peter Geoghegan 2016-03-22 04:04:18 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-03-22 04:14:09 Re: Relax requirement for INTO with SELECT in pl/pgsql
Previous Message Pavel Stehule 2016-03-22 04:09:26 Re: Relax requirement for INTO with SELECT in pl/pgsql