Quick Links

Re: looking for a faster way to do that

From:	hamann(dot)w(at)t-online(dot)de
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: looking for a faster way to do that
Date:	2011-09-25 06:04:43
Message-ID:	wolfgang-1110925080443.A0225425@amadeus3.local
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Eduardo Morras wrote:

>> >
>> >Hi,
>> >
>> >if I understand this right, it does not mean "check if the string
>> >appears at position 0"
>> >which could translate into an index query, but rather "check if the
>> >string appears anywhere
>> >and then check if that is position 0", so the entire table is checked.
>>
>> The second one yes, as it checks all patterns you want only one time
>> per row they only needs one table scan. The first one eliminates the
>> substring 'ABC' from the string, if the lengths of both strings are
>> equal, the substring 'ABC' wasn't in it. If they are different, the
>> trimmed string will be shorter.
>>
>> >explain analyze select items.num, wantcode from items, n where
>> >strpos(code, wantcode) = 0;
>> > Nested Loop (cost=167.14..196066.54 rows=39178 width=36) (actual
>> > time=0.074..36639.312 rows=7832539 loops=1)
>> > Join Filter: (strpos(("inner".code)::text, "outer".wantcode) = 0)
>> > -> Seq Scan on n (cost=0.00..14.15 rows=815 width=32) (actual
>> > time=0.005..2.212 rows=815 loops=1)
>> > -> Materialize (cost=167.14..263.28 rows=9614 width=42)
>> > (actual time=0.007..13.970 rows=9614 loops=815)
>> > -> Seq Scan on items (cost=0.00..167.14 rows=9614
>> > width=42) (actual time=0.044..14.855 rows=9614 loops=1)
>> > Total runtime: 46229.836 ms
>> >
>> >
>> >The query ran much faster than the pattern query, however. This
>> >seems to be the performance
>> >of just searching for a plain string vs. initializing the regex
>> >engine every time (for 815
>> >queries in a test set)
>>
>> It will do only one table scan while your original code will do one
>> for each substring you want to test. You can add more and more
>> substrings without too much cost. If you want to use the regex engine
>> instead the postgresql string funtions check the regexp_matches(), it
>> should be faster if you have 3000 substrings.
>>
>> select * from items where regexp_matches(items.code,'(ABC) (DE1)
>> (any_substring)')<>{};
>>

Hi Eduardo,

it is clear that scanning the table once with a list of matches will outperform
rescanning the table for every string wanted. Now, my problem is that the patterns are
dynamic as well. So if I could translate a table with one column and a few thousand rows
into something like
regexp_matches(code,'string1|string2|.....string2781')
would ideally be a performant query. Unfortunately I have no idea how I could achieve this
transformation inside the database. Doing it externally fails, because any single query cannot
be more than so few characters.

Regards
Wolfgang Hamann

In response to

Re: looking for a faster way to do that at 2011-09-24 14:28:06 from Eduardo Morras

Responses

Re: looking for a faster way to do that at 2011-09-25 11:34:28 from Alban Hertroys
Re: looking for a faster way to do that at 2011-09-25 19:09:36 from Eduardo Morras

Browse pgsql-general by date

	From	Date	Subject
Next Message	Albretch Mueller	2011-09-25 06:11:36	Re: (another ;-)) PostgreSQL-derived project ...
Previous Message	Reuven M. Lerner	2011-09-25 05:43:15	Re: Speed of lo_unlink vs. DELETE on BYTEA