Skip site navigation (1) Skip section navigation (2)

Sigh, LIKE indexing is *still* broken in foreign locales

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Moucha Václav <MouchaV(at)Radiomobil(dot)cz>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Sigh, LIKE indexing is *still* broken in foreign locales
Date: 2000-06-08 02:22:06
Message-ID: 14045.960430926@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-bugspgsql-hackers
=?ISO-8859-2?Q?Moucha_V=E1clav?= <MouchaV(at)Radiomobil(dot)cz> writes:
> 1. Compilation
>    ./configure --enable-locale	# not needed for RPMS precompiled binaries 

> 2. Starting postmaster
>    export LC_CTYPE=cs_CZ
>    export LC_COLLATE=cs_CZ		# this setting is important for the
> bug result
>    postmaster -S -D /home/pgsql/data -o '-Fe'	

> 3. SQL steps
>    create table test (name text);
>    insert into test values ('');	# the first char is E1 from LATIN 2
> coding
>    insert into test values ('b');
>    create index test_index on test (name);
>    set cpu_tuple_cost=1;		# force backend to use index
> scanning
>    select * from test where name like '%';

> BUG: Only 1 line is selected with '' only instead of both lines.

The problem here is that given the search pattern '\341%', the planner
generates index limit conditions
	name >= '\341' AND name < '\342';

Apparently, in CZ locale it is true that '\341' is less than '\342',
but it does not follow from that that all strings starting with '\341'
are less than '\342'.  In fact '\341b' is considered greater than '\342'.

Since '\341' and '\342' are two different accented forms of 'a'
(if I'm looking at the right character set), this is perhaps not so
improbable as all that.  Evidently the collation rule is that different
accent forms sort the same unless the strings would otherwise be
considered equal, in which case an ordering is assigned to them.

So, the rule we thought we had for generating index bounds falls flat,
and we're back to the same old question: given a proposed prefix string,
how can we generate bounds that are certain to be considered <= and >=
all strings starting with that prefix?

I am now thinking that maybe we should search for a string that compares
greater than "fooz" when the prefix is "foo" --- that is, append a 'z'
to the prefix string.  But I wouldn't be surprised if that fails too
in some locales.

I'm also wondering if the left-hand inequality ('foo' <= any string
beginning with 'foo') might fail in some locales ... we haven't seen
it reported but who knows ...

			regards, tom lane

In response to

  • LIKE bug at 2000-06-07 08:16:15 from Moucha Václav

Responses

pgsql-hackers by date

Next:From: Bruce MomjianDate: 2000-06-08 02:34:36
Subject: Re: Doc updates for index cost estimator change
Previous:From: Michael SimmsDate: 2000-06-08 01:16:03
Subject: Re: Apparent deadlock 7.0.1

pgsql-bugs by date

Next:From: Giles LeanDate: 2000-06-08 06:41:59
Subject: Re: [HACKERS] Sigh, LIKE indexing is *still* broken in foreign locales
Previous:From: Tom LaneDate: 2000-06-07 23:22:25
Subject: Re: Bug?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group