Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?

From: Stuart Woolford <stuartw(at)newmail(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Ross J(dot) Reedstrom" <reedstrm(at)wallace(dot)ece(dot)rice(dot)edu>
Cc: pgsql-general(at)postgreSQL(dot)org, Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>, hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?
Date: 1999-11-06 00:05:14
Message-ID: 99110613104200.00731@test.macmillan.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


Firstly, damb you guys are good, please accept my strongest complements for the
response time on this issue!

On Sat, 06 Nov 1999, Tom Lane wrote:
> "Ross J. Reedstrom" <reedstrm(at)wallace(dot)ece(dot)rice(dot)edu> writes:
> > Reviewing my email logs from June, most of the work on this has to do with
> > people who needs locales, and potentially multibyte character sets. Tom
> > Lane is of the opinion that this particular optimization needs to be moved
> > out of the parser, and deeper into the planner or optimizer/rewriter,
> > so a good fix may be some ways out.
>
> Actually, that part is already done: addition of the index-enabling
> comparisons is gone from the parser and is now done in the optimizer,
> which has a whole bunch of benefits (one being that the comparison
> clauses don't get added to the query unless they are actually used
> with an index!).
>
> But the underlying LOCALE problem still remains: I don't know a good
> character-set-independent method for generating a "just a little bit
> larger" string to use as the righthand limit. If anyone out there is
> an expert on foreign and multibyte character sets, some help would
> be appreciated. Basically, given that we know the LIKE or regex
> pattern can only match values beginning with FOO, we want to generate
> string comparisons that select out the range of values that begin with
> FOO (or, at worst, a slightly larger range). In USASCII locale it's not
> hard: you can do
> field >= 'FOO' AND field < 'FOP'
> but it's not immediately obvious how to make this idea work reliably
> in the presence of odd collation orders or multibyte characters...

how about something along the lines of:

file >='FOO' and field='FOO.*'

ie, terminate once the search fails on a match of the static left-hand-side
followed by anything (although I have the feeling this does not fit into your
execution system..), and a simple regex type check be added to the scan
validation code?

>
> BTW: the \377 hack is actually wrong for USASCII too, since it'll
> exclude a data value like 'FOO\377x' which should be included.

That's why I pointed out that in my particular case, I only have alpha and
numeric data in the database, so it is safe, it's certainly no general solution.

--
------------------------------------------------------------
Stuart Woolford, stuartw(at)newmail(dot)net
Unix Consultant.
Software Developer.
Supra Club of New Zealand.
------------------------------------------------------------

In response to

Browse pgsql-general by date

  From Date Subject
Next Message José A. Navarro =?iso-8859-1?Q?Ram=F3n?= 1999-11-06 02:04:17 subscribe pgsql-general
Previous Message Jeff MacDonald 1999-11-05 23:39:00 Banner (fwd)

Browse pgsql-hackers by date

  From Date Subject
Next Message Massimo Dal Zotto 1999-11-06 00:45:48 Re: [HACKERS] ERROR: infinite recursion in proc_exit
Previous Message Tom Lane 1999-11-05 16:46:36 Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?