Re: LIKE search and performance

From: mark(at)mark(dot)mielke(dot)cc
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: Mark Lewis <mark(dot)lewis(at)mir3(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alexander Staubo <alex(at)purefiction(dot)net>, Andy <frum(at)ar-sd(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: LIKE search and performance
Date: 2007-05-25 16:56:33
Message-ID: 20070525165633.GA13579@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, May 25, 2007 at 04:35:22PM +0100, Richard Huxton wrote:
> >I notice you did not provide a real life example as requested. :-)
> OK - any application that allows user-built queries: <choose column:
> foo> <choose filter: contains> <choose target: "bar">
> Want another? Any application that has a "search by name" box - users
> can (and do) put one letter in and hit enter.
> Unfortunately you don't always have control over the selectivity of
> queries issued.

The database has 10 million records. The user enters "bar" and it
translates to "%bar%". You are suggesting that we expect bar to match
1 million+ records? :-)

I hope not. I would define this as bad process. I would also use "LIMIT"
to something like "100".

> >This seems like an ivory tower restriction. Not allowing best performance
> >in a common situation vs not allowing worst performance in a not-so-common
> >situation.
> What best performance plan are you thinking of? I'm assuming we're
> talking about trailing-wildcard matches here, rather than "contains"
> style matches.

"Trailing-wildcard" already uses B-Tree index, does it not?

I am speaking of contains, as contains is the one that was said to
require a seqscan. I am questioning why it requires a seqscan. The
claim was made that with MVCC, the index is insufficient to check
for visibility and that the table would need to be accessed anyways,
therefore a seqscan is required. I question whether a like '%bar%'
should be considered a high selectivity query in the general case.
I question whether a worst case should be assumed.

Perhaps I question too much? :-)

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2007-05-25 17:06:46 Re: LIKE search and performance
Previous Message PFC 2007-05-25 16:51:04 Re: LIKE search and performance