Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ross J(dot) Reedstrom" <reedstrm(at)wallace(dot)ece(dot)rice(dot)edu>
Cc: Stuart Woolford <stuartw(at)newmail(dot)net>, pgsql-general(at)postgreSQL(dot)org, Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>, hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?
Date: 1999-11-05 16:46:36
Message-ID: 495.941820396@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

"Ross J. Reedstrom" <reedstrm(at)wallace(dot)ece(dot)rice(dot)edu> writes:
> Reviewing my email logs from June, most of the work on this has to do with
> people who needs locales, and potentially multibyte character sets. Tom
> Lane is of the opinion that this particular optimization needs to be moved
> out of the parser, and deeper into the planner or optimizer/rewriter,
> so a good fix may be some ways out.

Actually, that part is already done: addition of the index-enabling
comparisons is gone from the parser and is now done in the optimizer,
which has a whole bunch of benefits (one being that the comparison
clauses don't get added to the query unless they are actually used
with an index!).

But the underlying LOCALE problem still remains: I don't know a good
character-set-independent method for generating a "just a little bit
larger" string to use as the righthand limit. If anyone out there is
an expert on foreign and multibyte character sets, some help would
be appreciated. Basically, given that we know the LIKE or regex
pattern can only match values beginning with FOO, we want to generate
string comparisons that select out the range of values that begin with
FOO (or, at worst, a slightly larger range). In USASCII locale it's not
hard: you can do
field >= 'FOO' AND field < 'FOP'
but it's not immediately obvious how to make this idea work reliably
in the presence of odd collation orders or multibyte characters...

BTW: the \377 hack is actually wrong for USASCII too, since it'll
exclude a data value like 'FOO\377x' which should be included.

regards, tom lane

Responses

Browse pgsql-general by date

  From Date Subject
Next Message The Hermit Hacker 1999-11-05 21:12:54 PostgreSQL v6.5.3 Released
Previous Message Brett W. McCoy 1999-11-05 16:42:23 Re: [GENERAL] Stored Procedures

Browse pgsql-hackers by date

  From Date Subject
Next Message Stuart Woolford 1999-11-06 00:05:14 Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?
Previous Message Bruce Momjian 1999-11-05 16:37:02 Re: [GENERAL] indexed regex select optimisation missing?