Quick Links

Wildcard searches & performance question

From:	Grega Bremec <grega(dot)bremec(at)noviforum(dot)si>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Wildcard searches & performance question
Date:	2003-05-27 19:09:08
Message-ID:	20030527210908.A571@elbereth.noviforum.si
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hello,

I have a database with the following layout:

There are no uniqueness properties that would make it possible for this table
to have a primary key, as it is a list of searches performed on a search
engine and the users' behaviour is, well... umm, odd, to be mild. :)

Also, do note that this is a test table, so nevermind the excessive amount of
indexes - performance is not an issue here, I am still evaluating the need and
benefits of having various indexes on those columns.

The particular case I am interested is this: when executing queries involving
pattern searches using various operators on srchquery, none of the indexes are
used in certain cases, namely those LIKE and regexp filters that start with
a wildcard.

This makes perfect sense, because wildcard pattern searches that start with a
wildcard, can not really benefit from an index scan, because a sequential scan
is probably going to be faster: we are only going to benefit from scanning an
index in those special cases where the wildcard evaluates to a zero-length
string.

One example of a query plan:

searchterms=# explain select count(*)
from searches_2002
where srchquery like '%curriculum%';
QUERY PLAN
--------------------------------------------------------------------------
Aggregate (cost=4583.26..4583.26 rows=1 width=0)
-> Seq Scan on searches_2002 (cost=0.00..4583.26 rows=1 width=0)
Filter: (srchquery ~~ '%curriculum%'::text)

There is 211061 records in this test table, but the real-life tables would
contain a much much larger amount of data, more like 50+ million rows.

This promise of a hell on earth trying to optimize performance makes me wonder:
would there be a sensible way/reason for avoiding sequential scans on queries
that start with a wildcard, and would avoiding sequential scans even be
feasible in such cases? Or in other words, can I somehow optimize LIKE and
regexp queries that start with a wildcard?

TIA,
--
Grega Bremec
System Administration & Development Support
grega.bremec-at-noviforum.si
http://najdi.si/
http://www.noviforum.si/

Responses

Re: Wildcard searches & performance question at 2003-05-27 19:37:00 from Andrew Sullivan
Re: Wildcard searches & performance question at 2003-05-27 19:48:39 from scott.marlowe
Re: Wildcard searches & performance question at 2003-05-27 19:55:46 from Josh Berkus
Re: Wildcard searches & performance question at 2003-05-27 20:06:26 from Rod Taylor
Re: Wildcard searches & performance question at 2003-05-28 20:52:56 from Grega Bremec

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Andrew Sullivan	2003-05-27 19:37:00	Re: Wildcard searches & performance question
Previous Message	Tom Lane	2003-05-23 20:34:48	Simplifying varchar and bpchar behavior