Skip site navigation (1) Skip section navigation (2)

pgsql: Teach the regular expression functions to do case-insensitive

From: tgl(at)postgresql(dot)org (Tom Lane)
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Teach the regular expression functions to do case-insensitive
Date: 2009-12-01 21:00:24
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-committers
Log Message:
Teach the regular expression functions to do case-insensitive matching and
locale-dependent character classification properly when the database encoding
is UTF8.

The previous coding worked okay in single-byte encodings, or in any case for
ASCII characters, but failed entirely on multibyte characters.  The fix
assumes that the <wctype.h> functions use Unicode code points as the wchar
representation for Unicode, ie, wchar matches pg_wchar.

This is only a partial solution, since we're still stupid about non-ASCII
characters in multibyte encodings other than UTF8.  The practical effect
of that is limited, however, since those cases are generally Far Eastern
glyphs for which concepts like case-folding don't apply anyway.  Certainly
all or nearly all of the field reports of problems have been about UTF8.
A more general solution would require switching to the platform's wchar
representation for all regex operations; which is possible but would have
substantial disadvantages.  Let's try this and see if it's sufficient in

Modified Files:
        regc_locale.c (r1.9 -> r1.10)
        regcustom.h (r1.7 -> r1.8)

pgsql-committers by date

Next:From: Bruce MomjianDate: 2009-12-01 22:34:33
Subject: pgsql: psql -f - Adjust psql -f - to behave like a normal file and
Previous:From: Tom LaneDate: 2009-12-01 19:07:22
Subject: Re: Re: [COMMITTERS] pgsql: Rewrite GEQO`s gimme_tree function so that it always finds a

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group