Re: Windows UTF-8, non-ICU collation trouble

From: Noah Misch <noah(at)leadboat(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Windows UTF-8, non-ICU collation trouble
Date: 2019-12-06 07:33:49
Message-ID: 20191206073349.GC1629883@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 06, 2019 at 07:56:08PM +1300, Thomas Munro wrote:
> On Fri, Dec 6, 2019 at 7:34 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > We use system UTF-16 collation to implement UTF-8 collation on Windows. The
> > PostgreSQL security team received a report, from Timothy Kuun, that this
> > collation does not uphold the "symmetric law" and "transitive law" that we
> > require for btree operator classes. The attached test program demonstrates
> > this. http://www.delphigroups.info/2/62/478610.html quotes reports of that
> > problem going back eighteen years. Most code points are unaffected. Indexing
> > an affected code point using such a collation can cause btree index scans to not
> > find a row they should find and can make a UNIQUE or PRIMARY KEY constraint
> > admit a duplicate. The security team determined that this doesn't qualify as a
> > security vulnerability, but it's still a bug.
>
> Huh. Does this apply in modern times? Since Windows 10, I thought
> they adopted[1] CLDR data to drive that, the same definitions used (or
> somewhere in the process of being adopted by) GNU, Illumos, FreeBSD
> etc. Basically, everyone gave up on trying to own this rats nest of a
> problem and deferred to the experts.

Based on my test program, it applies to Windows Server 2016. I didn't test
newer versions.

> If you can still get
> index-busting behaviour out of modern Windows collations, wouldn't
> that be a bug that someone can file against SQL Server, Windows etc
> and get fixed?

Perhaps. I wouldn't have high hopes, given the behavior's long tenure and the
risk of breaking a different set of applications.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-12-06 07:48:42 Re: adding partitioned tables to publications
Previous Message Thomas Munro 2019-12-06 06:56:08 Re: Windows UTF-8, non-ICU collation trouble