Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Devrim Gündüz <devrim(at)gunduz(dot)org>, Jakob Egger <jakob(at)eggerapps(dot)at>, Tobias Bussmann <t(dot)bussmann(at)gmx(dot)net>
Subject: Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)
Date: 2016-08-11 01:05:19
Message-ID: CAM3SWZRpzdphRojzOQpjq1cm4nk-5Kf9P0W5T1rzJFih=2AOig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 10, 2016 at 1:42 PM, Palle Girgensohn <girgen(at)pingpong(dot)net> wrote:
> They've been used for the FreeBSD ports since 2005, and have served us well. I have of course updated them regularly. In this latest version, I've removed support for other encodings beside UTF-8, mostly since I don't know how to test them, but also, I see little point in supporting anything else using ICU.

Looks like you're not using the ICU equivalent of strxfrm(). While 9.5
is not the release that introduced its use, it did expand it
significantly. I think you need to fix this, even though it isn't
actually used to sort text at present, since presumably FreeBSD builds
of 9.5 don't TRUST_STRXFRM. Since you're using ICU, though, you could
reasonably trust the ICU equivalent of strxfrm(), so that's a missed
opportunity. (See the wiki page on the abbreviated keys issue [1] if
you don't know what I'm talking about.)

Shouldn't you really have a strxfrm() wrapper, used across the board,
including for callers outside of varlena.c? convert_string_datum() has
been calling strxfrm() for many releases now. These calls are still
used in FreeBSD builds, I would think, which seems like a bug that is
not dodged by simply not defining TRUST_STRXFRM. Isn't its assumption
that that matching the ordering used elsewhere not really hold on
FreeBSD builds?

[1] https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issue
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2016-08-11 01:36:55 Re: Set log_line_prefix and application name in test drivers
Previous Message Kisung Kim 2016-08-10 23:24:15 Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)