Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Date: 2016-03-22 23:19:44
Message-ID: 19132.1458688784@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I was a little worried that it was too much to hope for that all libc
> vendors on earth would ship a strxfrm() implementation that was actually
> consistent with strcoll(), and here we are.

Indeed. To try to put some scope on the problem, I made an idiot little
program that just generates some random UTF8 strings and sees whether
strcoll and strxfrm sort them alike. Attached are that program, a even
more idiot little shell script that runs it over all available UTF8
locales, and the results on my RHEL6 box. While de_DE seems to be the
worst-broken locale, it's far from the only one.

Please try this on as many platforms as you can get hold of ...

regards, tom lane

Attachment Content-Type Size
strcolltest.c text/x-c 4.3 KB
tryalllocales.sh text/x-shellscript 148 bytes
rhel6-results text/plain 29.2 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-03-22 23:26:15 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Previous Message Robert Haas 2016-03-22 22:06:37 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-22 23:26:15 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Previous Message Tomas Vondra 2016-03-22 22:35:39 Re: Using quicksort for every external sort run