Re: looking for a faster way to do that

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: hamann(dot)w(at)t-online(dot)de
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: looking for a faster way to do that
Date: 2011-09-25 11:34:28
Message-ID: 976CEC06-8525-4A87-BBCF-84211ACD91CB@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 25 Sep 2011, at 8:04, hamann(dot)w(at)t-online(dot)de wrote:
> Hi Eduardo,
>
> it is clear that scanning the table once with a list of matches will outperform
> rescanning the table for every string wanted. Now, my problem is that the patterns are
> dynamic as well. So if I could translate a table with one column and a few thousand rows
> into something like
> regexp_matches(code,'string1|string2|.....string2781')
> would ideally be a performant query. Unfortunately I have no idea how I could achieve this
> transformation inside the database. Doing it externally fails, because any single query cannot
> be more than so few characters.

To me it sounds a little bit like you're comparing every item in a warehouse to a set of descriptions to see what type of item it is, which is something you would be much better off storing as a property of the item. If an item is a fruit, store that it's a fruit!
But I'm guessing at what you're trying to accomplish, so here's a few other options...

I guess you could create 2781 expression indexes to do what you want (is there a limit that prevents this?). Query planning would probably become kind of slow and the indices will take up a considerable fraction of the total table storage required - that's a pretty outlandish approach.

CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string1'));
CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string2'));
...
CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string2781'));

Or are you really going to query every record against all 2781 regexes? I can't figure out a realistic scenario why you (or anyone) would want that.
In that case those indices aren't going to help you much, as the planner would have to hold every record in tbl to each index - it won't do that.

You could also create a giant lookup table (a materialized view, if you like) where the results of every match of str in tbl against the wantcode in the regex table is stored. That's some huge overhead, but it will probably outperform most other options. With the numbers you gave that table will hold about 2-3 billion records with two foreign key values and a truth value each.

Alban Hertroys

--
The scale of a problem often equals the size of an ego.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message pasman pasmański 2011-09-25 13:19:28 New feature: accumulative functions.
Previous Message Eduardo Morras 2011-09-25 09:47:39 Re: Speed of lo_unlink vs. DELETE on BYTEA