Re: default locale considered harmful? (was Re: [GENERAL]

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default locale considered harmful? (was Re: [GENERAL]
Date: 2003-05-31 22:18:39
Message-ID: 200305312218.h4VMIee21738@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > So, my understanding is that you would create something such as:
> > CREATE INDEX iix ON tab (LIKE col)
> > and that does LIKE lookups and knows how to do col LIKE 'abc%', but it
> > can't be used for >= or ORDER BY, but it can be used for equality tests?
>
> Hm. Right at the moment, it wouldn't be used for equality tests unless
> you spelled equality as "a ~=~ b". I wonder whether that's necessary
> though; couldn't we dispense with that operator and use ordinary
> equality as the BTEqual member of these opclasses? Are there any
> locales that claim that not-physically-identical strings are equal?

Let me see if I understand.

Our default indexes will be able to do =, >, <, ORDER BY, and the
special index will be able to do LIKE, ORDER BY, and maybe equals. Do I
have that correct?

Looking at CVS, I see the warning about non-C locales has been removed.
Should we instead mention the new LIKE index method?

# (Be sure to maintain the correspondence with locale_is_like_safe() in selfuncs.c.)
if test x`pg_getlocale COLLATE` != xC && test x`pg_getlocale COLLATE` != xPOSIX; then
echo "This locale setting will prevent the use of indexes for pattern matching"
echo "operations. If that is a concern, rerun $CMDNAME with the collation order"
echo "set to \"C\". For more information see the Administrator's Guide."
fi

Doing LIKE with single-byte encodings would be easy because it would be
only 256 compares to find the min/max char values, but that doesn't work
with multi-byte encodings, right?

This LIKE/encoding problem is a tricky one because it gives poor
performance with little warning to users.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2003-06-01 01:31:41 Re: Slashdot: SAP and MySQL Join Forces
Previous Message Jason Ziegler 2003-05-31 22:14:56 Re: pgAdmin3 snapshots

Browse pgsql-hackers by date

  From Date Subject
Next Message Sean Chittenden 2003-06-01 01:43:23 Re: [HACKERS] Are we losing momentum?
Previous Message Dave Page 2003-05-31 19:12:56 The Register moving to Bricolage + PostgreSQL...