Re: LIKE search and performance

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: James Mansion <james(at)mansionfamily(dot)plus(dot)com>
Cc: Alexander Staubo <alex(at)purefiction(dot)net>, Andy <frum(at)ar-sd(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: LIKE search and performance
Date: 2007-05-24 19:23:55
Message-ID: 4655E64B.5050503@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

James Mansion wrote:
> Alexander Staubo wrote:
>> On 5/23/07, Andy <frum(at)ar-sd(dot)net> wrote:
>>> An example would be:
>>> SELECT * FROM table
>>> WHERE name like '%john%' or street like
>>> '%srt%'
>>>
>>> Anyway, the query planner always does seq scan on the whole table and
>>> that
>>> takes some time. How can this be optimized or made in another way to be
>>> faster?
>>
>> There's no algorithm in existence that can "index" arbitrary
>> substrings the way you think. The only rational way to accomplish this
>> is to first break the text into substrings using some algorithm (eg.,
>> words delimited by whitespace and punctuation), and index the
>> substrings individually.
> That seems rather harsh. If I'd put an index on each of these colomns
> I'd certainly
> expect it to use the indices - and I'm pretty sure that Sybase would.
> I'd expect
> it to scan the index leaf pages instead of the table itself - they
> should be much
> more compact and also likely to be hot in cache.

If Sybase is still like SQL Server (or the other way around), it *may*
end up scanning the index *IFF* the index is a clustered index. If it's
a normal index, it will do a sequential scan on the table.

Yes, the leaf page of the index is more compact, but you also have to
scan the intermediate pages to get to the leaf pages. But again, it can
be a win. On such a system.

It's not a win on PostgreSQL, because of our MVCC implementation. We
need to scan *both* index *and* data pages if we go down that route, in
which case it's a lot faster to just scan the data pages alone.

I don't really know how MSSQL deals with this now that they have
MVCC-ish behavior, and I have no idea at all if sybase has anything like
MVCC.

> Why *wouldn't* the planner do this?

The question should be why the optimizer doesn't consider it, and the
executor uses it. The planner really doesn't decide that part :-)
Hopefully, the answer can be found above.

//Magnus

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message James Mansion 2007-05-24 20:54:34 Re: LIKE search and performance
Previous Message James Mansion 2007-05-24 18:50:29 Re: LIKE search and performance