| From: | Erich Stamberger <eberger(at)gewi(dot)kfunigraz(dot)ac(dot)at> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | Moucha Václav <MouchaV(at)Radiomobil(dot)cz>, pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: Sigh, LIKE indexing is *still* broken in foreign locales | 
| Date: | 2000-06-09 00:25:52 | 
| Message-ID: | Pine.LNX.4.21.0006090101230.6349-100000@gewi.kfunigraz.ac.at | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs pgsql-hackers | 
On Wed, 7 Jun 2000, Tom Lane wrote:
> =?ISO-8859-2?Q?Moucha_V=E1clav?= <MouchaV(at)Radiomobil(dot)cz> writes:
> > 1. Compilation
> >    ./configure --enable-locale	# not needed for RPMS precompiled binaries 
> 
> > 2. Starting postmaster
> >    export LC_CTYPE=cs_CZ
> >    export LC_COLLATE=cs_CZ		# this setting is important for the
> > bug result
> >    postmaster -S -D /home/pgsql/data -o '-Fe'	
> 
> > 3. SQL steps
> >    create table test (name text);
> >    insert into test values ('á');	# the first char is E1 from LATIN 2
> > coding
> >    insert into test values ('áb');
> >    create index test_index on test (name);
> >    set cpu_tuple_cost=1;		# force backend to use index
> > scanning
> >    select * from test where name like 'á%';
> 
> > BUG: Only 1 line is selected with 'á' only instead of both lines.
> 
> The problem here is that given the search pattern '\341%', the planner
> generates index limit conditions
> 	name >= '\341' AND name < '\342';
> 
> Apparently, in CZ locale it is true that '\341' is less than '\342',
> but it does not follow from that that all strings starting with '\341'
> are less than '\342'.  In fact '\341b' is considered greater than '\342'.
> 
Hm. The character that follows 0xe1 in iso-8859-2 order is
"a + circumflex" (Oxe2) which is - as far as I know - not
part of the Czech alphapet. The successors of  0xe1 in
Czech collation order (code points from iso-8859-2)
are 0x042 (capital B) and 0x062 (small B).
=> name >= '0xe1' AND (name < '0x062' OR name < '0x042')
 
provided comparision is done by strcoll().
Another interresting feature of Czech collation is:
H < "CH" < I
and:
B < C < C + CARON < D .. < H < "CH" < I
So what happens with "WHERE name like 'Czec%`" ?
Regards
Erich
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2000-06-09 00:57:56 | Re: [HACKERS] Sigh, LIKE indexing is *still* broken in foreign locales | 
| Previous Message | Giles Lean | 2000-06-08 20:45:21 | Re: Sigh, LIKE indexing is *still* broken in foreign locales | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2000-06-09 00:57:56 | Re: [HACKERS] Sigh, LIKE indexing is *still* broken in foreign locales | 
| Previous Message | Hiroshi Inoue | 2000-06-09 00:10:57 | RE: DROP COLUMN status |