Re: best practise/pattern for large OR / LIKE searches

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: best practise/pattern for large OR / LIKE searches
Date: 2009-08-30 13:17:30
Message-ID: 162867790908300617u794ba134k2dfa0c7ebad32c8b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello

> regex is compiled to a finite state machine and then the datanumber
> column is scanned in a single pass (for each row)
>
>> Searches are currently taking to long and we would like to optimize
>> them, but before we dive into our own solution we
>> where wondering if there already common solutions for this...
>
> try regex first if that's too slow you may need to write a
> dictionary function that splits datanuimber into it's components
> and use full text index/search. (this will slow down updates as they will do
> upto 20 inserts into the index)
>
> searches should then be optimally fast
>

I did some tests:

1) I fill test table
insert into test SELECT
array_to_string(array_agg(array_to_string(ARRAY(select
substring('01234567890' from (random()*10)::int + 1 for 1) from
generate_series(1,(random()*10+5)::int + i - i)),'')),',') as b from
generate_series(1,100000) g(i) group by (random()*1000)::int;

2. I tested searching of 5 or 13 values. I did tests on 8.4 and 8.1

8.1
using like 190ms(440ms*)
using regexp 115ms(259ms*)

* for 13 values - so there regexp is faster than like

on 8.4
using like 80ms(151ms)
using regexp 131ms(267ms)

so like is faster then regexp on 8.4.

fulltext test (8.4)
420ms(470ms) -- without index
14ms(26ms) -- with GiST index
1ms(2ms) -- with Gin index

some samples of test queries:
select * from test where to_tsvector('simple',a) @@
to_tsquery('simple','296426496|7707431116555|98173598191|302598|53174827|02292064629|188631468777|4756243248|920473|16602317|76613513|78640|9176323');

select * from test where a ~
'296426496|7707431116555|98173598191|302598|53174827|02292064629|188631468777|4756243248|920473|16602317|76613513|78640|9176323';

select * from test where a like '%296426496%' or a like '%
7707431116555%' or a like '%98173598191%' or a like '%302598%' or a
like '%53174827%' or a like '%02292064629%' or a like '%188631468777%'
or a like '%4756243248%' or a like '%920473%' or a like '%16602317%'
or a like '%76613513%' or a like '%78640%' or a like '%9176323%';

regards
Pavel Stehule

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Pavel Stehule 2009-08-30 13:22:16 Re: Trouble using TG_TABLE_NAME in BEFORE INSERT OR UPDATE trigger
Previous Message Stephen Cuppett 2009-08-30 13:11:53 Re: Trouble using TG_TABLE_NAME in BEFORE INSERT OR UPDATE trigger