Re: Sigh, LIKE indexing is *still* broken in foreign locales

From: Erich Stamberger <eberger(at)gewi(dot)kfunigraz(dot)ac(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Moucha Václav <MouchaV(at)Radiomobil(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sigh, LIKE indexing is *still* broken in foreign locales
Date: 2000-06-09 00:25:52
Message-ID: Pine.LNX.4.21.0006090101230.6349-100000@gewi.kfunigraz.ac.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, 7 Jun 2000, Tom Lane wrote:

> =?ISO-8859-2?Q?Moucha_V=E1clav?= <MouchaV(at)Radiomobil(dot)cz> writes:
> > 1. Compilation
> > ./configure --enable-locale # not needed for RPMS precompiled binaries
>
> > 2. Starting postmaster
> > export LC_CTYPE=cs_CZ
> > export LC_COLLATE=cs_CZ # this setting is important for the
> > bug result
> > postmaster -S -D /home/pgsql/data -o '-Fe'
>
> > 3. SQL steps
> > create table test (name text);
> > insert into test values ('á'); # the first char is E1 from LATIN 2
> > coding
> > insert into test values ('áb');
> > create index test_index on test (name);
> > set cpu_tuple_cost=1; # force backend to use index
> > scanning
> > select * from test where name like 'á%';
>
> > BUG: Only 1 line is selected with 'á' only instead of both lines.
>
> The problem here is that given the search pattern '\341%', the planner
> generates index limit conditions
> name >= '\341' AND name < '\342';
>
> Apparently, in CZ locale it is true that '\341' is less than '\342',
> but it does not follow from that that all strings starting with '\341'
> are less than '\342'. In fact '\341b' is considered greater than '\342'.
>

Hm. The character that follows 0xe1 in iso-8859-2 order is
"a + circumflex" (Oxe2) which is - as far as I know - not
part of the Czech alphapet. The successors of 0xe1 in
Czech collation order (code points from iso-8859-2)
are 0x042 (capital B) and 0x062 (small B).

=> name >= '0xe1' AND (name < '0x062' OR name < '0x042')

provided comparision is done by strcoll().

Another interresting feature of Czech collation is:

H < "CH" < I

and:

B < C < C + CARON < D .. < H < "CH" < I

So what happens with "WHERE name like 'Czec%`" ?

Regards
Erich

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Eisentraut 2000-06-09 00:57:56 Re: [HACKERS] Sigh, LIKE indexing is *still* broken in foreign locales
Previous Message Giles Lean 2000-06-08 20:45:21 Re: Sigh, LIKE indexing is *still* broken in foreign locales

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2000-06-09 00:57:56 Re: [HACKERS] Sigh, LIKE indexing is *still* broken in foreign locales
Previous Message Hiroshi Inoue 2000-06-09 00:10:57 RE: DROP COLUMN status