Re: How to get around LIKE inefficiencies?

From: Ron Chmara <ron(at)Opus1(dot)COM>
To: The Hermit Hacker <scrappy(at)hub(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: How to get around LIKE inefficiencies?
Date: 2000-11-06 03:19:21
Message-ID: 3A062337.23BE3AE3@opus1.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The Hermit Hacker wrote:
> I'm tryin to figure out how to speed up udmsearch when run under
> postgresql, and am being hit by atrocious performance when using a LIKE
> query ... the query looks like:
> SELECT ndict.url_id,ndict.intag
> FROM ndict,url
> WHERE ndict.word_id=1971739852
> AND url.rec_id=ndict.url_id
> AND (url.url LIKE 'http://www.postgresql.org/%');
> Take off the AND ( LIKE ) part of the query, finishes almost as soon as
> you hit return. Put it back in, and you can go for coffee before it
> finishes ...

The entire *approach* is wrong. I'm currently in the process of optimizing
a db which is used for logfile mining, and it was originally built with the same
kludge.... it seems to make sense when there's only a few thousand records,
but at 20 million records, yikes!

The problem is that there's a "like" operation for something that is
fundamentally static (http://www.postgresql.org/) with some varying
data *after it*, that you're not using, in any form, for this operation.
This can be solved one of two ways:

1. Preprocess your files to strip out the paths and arguments on
a new field for the domain call. You are only setting up that data once,
so you shouldn't be using a "like" operator for every query. It's not
like on monday the server is "http://www.postgresql.org/1221" and on
tuesday the server is "http://www.postgresql.org/12111". It's always
the *same server*, so split out that data into it's own column, it's own
index.

This turns your query into:
SELECT ndict.url_id,ndict.intag
FROM ndict,url
WHERE ndict.word_id=1971739852
AND url.rec_id=ndict.url_id
AND url.server_url='http://www.postgresql.org/';

2. Trigger to do the above, if you're doing on-the-fly inserts into
your db (so you can't pre-process).

-Ronabop

--
Brought to you from iBop the iMac, a MacOS, Win95, Win98, LinuxPPC machine,
which is currently in MacOS land. Your bopping may vary.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message The Hermit Hacker 2000-11-06 03:34:11 Re: How to get around LIKE inefficiencies?
Previous Message Philip Warner 2000-11-06 03:18:34 Re: How to get around LIKE inefficiencies?