BUG #5743: Regexp engine fails to case-insensitively match multi-byte codepoints

From: "Vlad Romascanu" <vromascanu(at)accurev(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5743: Regexp engine fails to case-insensitively match multi-byte codepoints
Date: 2010-11-04 00:48:39
Message-ID: 201011040048.oA40md61095262@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5743
Logged by: Vlad Romascanu
Email address: vromascanu(at)accurev(dot)com
PostgreSQL version: 8.4.3
Operating system: Windows, Linux
Description: Regexp engine fails to case-insensitively match
multi-byte codepoints
Details:

Already reported in 2006 but seems to have fallen through the cracks (I can
find no followup.) Problem still exists in v8.4.3.

Problem still appears to be pg_wc_tolower downcasting to char before calling
tolower() (instead of calling towlower().)

This one of several inconsistencies unfortunately still present in
case-insensitive regexp vs. LOWER(str) [str_lower] treatment (including char
to wchar conversion using MultiByteToWideChar/mbstowcs vs. char2wchar, or
towlower vs. pg_wc_tolower.)

Current workaround is to use LOWER(str) ~ LOWER('regexp').

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2010-11-04 03:52:09 Re: BUG #5743: Regexp engine fails to case-insensitively match multi-byte codepoints
Previous Message Dimitri Fontaine 2010-11-03 21:36:12 Re: BUG #5740: contrib/spi/moddatetime.c doesn't work with timezones.