Re: Comparing strings with non-ASCII characters

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Artur Rataj <arataj(at)iitis(dot)gliwice(dot)pl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Comparing strings with non-ASCII characters
Date: 2002-11-13 14:58:22
Message-ID: 21525.1037199502@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Artur Rataj <arataj(at)iitis(dot)gliwice(dot)pl> writes:
> Anyway, why the index optimizations are disabled for the ~ operator
> and not for the < or > operators in the locale?

The < and > operators are compatible with the index ordering, by
definition. The trouble with ~ is that it is not doing ordering,
but pattern matching. You can exploit an index to narrow down
the candidates for a left-anchored pattern only when the index
ordering is strict lexicographic ... and in all too many non-C
locales, it ain't. Digraphs, multipass sorting rules, and things
like that are killers.

We have tried and failed (repeatedly) to find a way to use non-C
indexes for LIKE and ~. If you'd like to try again, you're more
than welcome, but I suggest you read the PG list archives to learn
the reasons why previous attempts didn't work.

The long-run answer to this will probably be to allow individual columns
to be declared as being of particular locales; then you could assign C
locale to a column you need to do pattern matching on. This seems to
require building our own locale library :-( ... so it's not going to
happen quickly.

BTW, you might think about using full-text search (contrib/tsearch
or contrib/fulltextindex) instead of pattern matching.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Stephan Szabo 2002-11-13 15:13:03 Re: index question..
Previous Message Areski 2002-11-13 14:42:13 Re: Performance : Optimize query