Re: BUG #2895: Private Use Unicode character crashes server when using ILIKE

From: Michael Fuhr <mike(at)fuhr(dot)org>
To: James Russell <internationalhobo(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2895: Private Use Unicode character crashes server when using ILIKE
Date: 2007-01-17 03:47:42
Message-ID: 20070117034742.GA53257@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jan 16, 2007 at 06:16:22AM +0000, James Russell wrote:
> Description: Private Use Unicode character crashes server when using ILIKE

The archives show that ILIKE is known to be broken with multibyte
characters in 8.1 and earlier, although I don't recall seeing reports
of a crash resulting. I got a crash in 8.1.6 built from the latest
source in CVS; here's a partial stack trace:

(gdb) bt
#0 MBMatchTextIC (t=0x2a98613d1c "�\200\202\206", tlen=4, p=0x0, plen=4) at like_match.c:195
#1 0x00000000005ae558 in texticlike (fcinfo=Variable "fcinfo" is not available.
) at like.c:355

I wonder if this is a problem only with code points outside of Plane 0,
viz., those with UTF-8 sequences longer than three bytes. I don't get
a crash with U+FFFD (E'\357\277\275') but I do with U+10000
(E'\360\220\200\200') and other four-byte sequences.

> - I have not yet tried to reproduce the bug on the latest Postgres 8.2.x

It appears to work in 8.2.1; at least it didn't crash. The 8.2
Release Notes contain the following item:

* Allow ILIKE to work for multi-byte encodings (Tom)

Internally, ILIKE now calls lower() and then uses LIKE. Locale-specific
regular expression patterns still do not work in these encodings.

--
Michael Fuhr

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Maarten van der Heijden 2007-01-17 08:29:55 Troubles in Initializing Postgres Database 8.2
Previous Message Bruce Momjian 2007-01-17 02:57:18 Re: BUG #2898: dynamic load support