more] indexed regex select optimisations?

From: Stuart Woolford <stuartw(at)newmail(dot)net>
To: pgsql-general(at)postgreSQL(dot)org, hackers(at)postgreSQL(dot)org
Subject: more] indexed regex select optimisations?
Date: 1999-11-07 23:50:41
Message-ID: 99110813015600.00960@test.macmillan.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


Well, I've improved my regex text searches to actually use the indexes properly
now for the basic case, but I have found another 'problem' (or feature, call it
what you will ;) - to demonstrate:
with locale turned on (the default RPMS are like this):

the following takes a LONG time to run on 1.6 million records:
-------------------------------------
explain select isbn, count from inv_word_i where
word~'^foo'
order by count

Sort (cost=35148.70 rows=353 width=16)
-> Index Scan using i3 on inv_word_i (cost=35148.70 rows=353 width=16)
-------------------------------------
the following runs instantly, and does (nearly) the same thing:
-------------------------------------
explain select isbn, count from inv_word_i where
word>='foo' and word<'fop'
order by count

Sort (cost=11716.57 rows=183852 width=16)
-> Index Scan using i3 on inv_word_i (cost=11716.57 rows=183852 width=16)
-------------------------------------
but what about the following? :
-------------------------------------
explain select isbn , sum(count) from inv_word_i where
(word>='window' and word<'windox')
or
(word>='idiot' and word<'idiou')
group by isbn
order by sum(count) desc

Sort (cost=70068.84 rows=605525 width=16)
-> Aggregate (cost=70068.84 rows=605525 width=16)
-> Group (cost=70068.84 rows=605525 width=16)
-> Sort (cost=70068.84 rows=605525 width=16)
-> Seq Scan on inv_word_i (cost=70068.84 rows=605525 width=16)
-------------------------------------

this is the fastest way I've found so far to do a multi-word search (window and
idiot as the root words in this case), you note it does NOT use the indexes,
but falls back to a linear scan?!? it takes well over 30 seconds (much much too
long)

I've tried a LOT of different combinations, and have yet to find a way of
getting the system to use the indexes correctly to do what I want, the closest
I've ffound is using a select intersect select method to find all docs
containing both word (what I really want, although the query above is a ranked
or query), but it gets slow as soon as I select more than one field for the
results (I need to line isbn in this case to another database in the final
application)

I assume there is some reason the system falls back to a linear scan in this
case? it seems two index lookups would be much much more efficient..

am I missing something again?

--
------------------------------------------------------------
Stuart Woolford, stuartw(at)newmail(dot)net
Unix Consultant.
Software Developer.
Supra Club of New Zealand.
------------------------------------------------------------

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Matthias Teege 1999-11-08 08:15:38 Compiling problems
Previous Message Peter Eisentraut 1999-11-07 16:44:10 Re: [GENERAL] users in Postgresql

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-11-08 06:06:15 new Psql \pset border
Previous Message Thomas Lockhart 1999-11-07 23:37:45 Re: [HACKERS] psql and 6.5.3