Re: Planner should use index on a LIKE 'foo%' query

From: Moritz Onken <onken(at)houseofdesign(dot)de>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner should use index on a LIKE 'foo%' query
Date: 2008-06-30 07:16:44
Message-ID: EBA8A1BD-AF92-4203-93E4-2A7D7ADEA94B@houseofdesign.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Anfang der weitergeleiteten E-Mail:

> Von: Moritz Onken <onken(at)houseofdesign(dot)de>
> Datum: 30. Juni 2008 09:16:06 MESZ
> An: Steinar H. Gunderson <sgunderson(at)bigfoot(dot)com>
> Betreff: Re: [PERFORM] Planner should use index on a LIKE 'foo%' query
>
>
> Am 28.06.2008 um 21:19 schrieb Steinar H. Gunderson:
>
>> On Sat, Jun 28, 2008 at 06:24:42PM +0200, Moritz Onken wrote:
>>> SELECT distinct url from item where url like 'http://www.micro%'
>>> limit
>>> 10;
>>
>> Here, the planner knows the pattern beforehand, and can see that
>> it's a
>> simple prefix.
>>> select *
>>> from result
>>> where exists
>>> (select * from item where item.url LIKE result.url || '%' limit 1)
>>> limit 10;
>>
>> Here it cannot (what if result.url was '%foo%'?).
>
> That's right. Thanks for that hint. Is there a Postgres function
> which returns a constant (possibly an escape function)?
>>
>>
>> Try using something like (item.url >= result.url && item.url <=
>> result.url ||
>> 'z'), substituting an appropriately high character for 'z'.
>>
>>> The only explaination is that I don't use a constant when
>>> comparing the
>>> values. But actually it is a constant...
>
> I created a new column in "item" where I store the shortened url
> which makes "=" comparisons possible.
>
> the result table has 20.000.000 records and the item table 5.000.000.
> The query
>
> select count(1) from result where url in (select shorturl from item
> where shorturl = result.url);
>
> will take about 8 hours (still running, just guessing). Is this
> reasonable on a system with 1 GB of RAM and a AMD Athlon 64 3200+
> processor? (1 SATA HDD)
>
> regards,
>
> moritz
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Moritz Onken 2008-06-30 07:22:12 Re: Planner should use index on a LIKE 'foo%' query
Previous Message Rusty Conover 2008-06-30 06:59:16 Re: Out of memory for Select query.