Quick Links

more] indexed regex select optimisations?

From:	Stuart Woolford <stuartw(at)newmail(dot)net>
To:	pgsql-general(at)postgreSQL(dot)org, hackers(at)postgreSQL(dot)org
Subject:	more] indexed regex select optimisations?
Date:	1999-11-07 23:50:41
Message-ID:	99110813015600.00960@test.macmillan.co.nz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

Well, I've improved my regex text searches to actually use the indexes properly
now for the basic case, but I have found another 'problem' (or feature, call it
what you will ;) - to demonstrate:
with locale turned on (the default RPMS are like this):

the following takes a LONG time to run on 1.6 million records:
-------------------------------------
explain select isbn, count from inv_word_i where
word~'^foo'
order by count

Sort (cost=35148.70 rows=353 width=16)
-> Index Scan using i3 on inv_word_i (cost=35148.70 rows=353 width=16)
-------------------------------------
the following runs instantly, and does (nearly) the same thing:
-------------------------------------
explain select isbn, count from inv_word_i where
word>='foo' and word<'fop'
order by count

Sort (cost=11716.57 rows=183852 width=16)
-> Index Scan using i3 on inv_word_i (cost=11716.57 rows=183852 width=16)
-------------------------------------
but what about the following? :
-------------------------------------
explain select isbn , sum(count) from inv_word_i where
(word>='window' and word<'windox')
or
(word>='idiot' and word<'idiou')
group by isbn
order by sum(count) desc

Sort (cost=70068.84 rows=605525 width=16)
-> Aggregate (cost=70068.84 rows=605525 width=16)
-> Group (cost=70068.84 rows=605525 width=16)
-> Sort (cost=70068.84 rows=605525 width=16)
-> Seq Scan on inv_word_i (cost=70068.84 rows=605525 width=16)
-------------------------------------

this is the fastest way I've found so far to do a multi-word search (window and
idiot as the root words in this case), you note it does NOT use the indexes,
but falls back to a linear scan?!? it takes well over 30 seconds (much much too
long)

I've tried a LOT of different combinations, and have yet to find a way of
getting the system to use the indexes correctly to do what I want, the closest
I've ffound is using a select intersect select method to find all docs
containing both word (what I really want, although the query above is a ranked
or query), but it gets slow as soon as I select more than one field for the
results (I need to line isbn in this case to another database in the final
application)

I assume there is some reason the system falls back to a linear scan in this
case? it seems two index lookups would be much much more efficient..

am I missing something again?

--
------------------------------------------------------------
Stuart Woolford, stuartw(at)newmail(dot)net
Unix Consultant.
Software Developer.
Supra Club of New Zealand.
------------------------------------------------------------

In response to

Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing? at 1999-11-06 21:13:03 from Charles Tassell

Browse pgsql-general by date

	From	Date	Subject
Next Message	Matthias Teege	1999-11-08 08:15:38	Compiling problems
Previous Message	Peter Eisentraut	1999-11-07 16:44:10	Re: [GENERAL] users in Postgresql

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	1999-11-08 06:06:15	new Psql \pset border
Previous Message	Thomas Lockhart	1999-11-07 23:37:45	Re: [HACKERS] psql and 6.5.3