From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Mario Lopez <mario(at)lar3d(dot)com> |
Cc: | depesz(at)depesz(dot)com, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Working with huge amount of data. RESULTS! |
Date: | 2008-02-12 15:15:49 |
Message-ID: | Pine.LNX.4.64.0802121809410.23796@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, 12 Feb 2008, Mario Lopez wrote:
> Hi!,
>
> I optimized the LIKE 'keyword%' and LIKE '%keyword' with the following
> results:
>
> # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from
> table1 where varchar_reverse(data) like varchar_reverse('%keyword');"
> real 0m0.055s
> user 0m0.011s
> sys 0m0.006s
>
>
> # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from
> table1 where data like 'keyword%';"
> real 0m0.026s
> user 0m0.012s
> sys 0m0.006s
>
> It works flawlesly as you can see by the timings, take in consideration that
> "table1" has 100 million records. The only problem is generating the reversed
> index which takes like 20 minutes, I guess it has to do with the plperl
> function, perhaps a C function for inverting would make it up in less time.
>
> The problem is still with the LIKE '%keyword%', my problem is that I am not
> searching for Words in a dictionary fashion, suppose my "data" is random
> garbage, that it has common consecutive bytes. How could I generate a
> dictionary from this random garbage to make it easier for indexing?
suffix tree (array) would speedup '%keyword%' query, but currently it doesn't
supported by GiST extension architecture (we have it in our TODO), so I see
several ways (not tested):
1. try contrib/pg_trgm to reduce a number of candidate words
2. generate all possible substrings and use your tested approach
>
>> On Mon, Feb 11, 2008 at 04:37:24PM +0100, Mario Lopez wrote:
>>
>>> SELECT * FROM names WHERE name LIKE keyword%
>>> Or
>>> SELECT * FROM names WHERE name LIKE %keyword%
>>>
>>
>> check this:
>> http://www.depesz.com/index.php/2007/07/30/indexable-field-like-something/
>> and this:
>> http://www.depesz.com/index.php/2007/09/15/speeding-up-like-xxx/
>>
>> depesz
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-02-12 15:26:06 | Re: TSearch2 Migration Guide from 8.2 to 8.3 |
Previous Message | Thomas Chille | 2008-02-12 15:13:33 | Some Autovacuum Questions |