Re: Query tuning help

From: Dan Harris <fbsd(at)drivefaster(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query tuning help
Date: 2005-05-09 01:49:30
Message-ID: 6f4a543a2fbd348c7e82876b8e477600@drivefaster.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On May 8, 2005, at 6:51 PM, Russell Smith wrote:

> On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
> You cannot use an index for %CORVETTE%, or %RED%. There is no way
> for the index to know if a row had that in the middle without scanning
> the whole
> index. So it's much cheaper to do a sequence scan.
>

While I believe you, I'm confused by this line in my original EXPLAIN
ANALYZE:

>> -> Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47)
>> (actual time=2.085..2.309 rows=2 loops=473)
>> Index Cond:
>> ((ea.incidentid)::text = ("outer".incidentid)::text)
>> Filter: (((recordtext)::text
>> ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))

Doesn't that mean it was using an index to filter? Along those lines,
before I created index 'ea1', the query was much much slower. So, it
seemed like creating this index made a difference.

> One possible way to make the query faster is to limit based on date,
> as you will only get about 700 rows.
> And then don't use subselects, as they are doing full sequence scans.
> I think this query does what you do
> above, and I think it will be faster, but I don't know.
>

I REALLY like this idea! If I could just filter by date first and then
sequential scan through those, it should be very manageable. Hopefully
I can keep this goal while still accommodating the requirement listed
in my next paragraph.

> select distinct em.incidentid, ea.recordtext as retdata, eg.long,
> eg.lat
> FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=
> '2005-1-1 00:00'
> AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%'
> AND ea.recordtext like '%CORVETTE%')
> JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like
> '%RED%' or recordtext like '%CORVETTE%' );
>

I have run this, and while it is very fast, I'm concerned it's not
doing what I need. Here's the situation:

Due to the format of the systems with which I integrate ( I have no
control over these formats ), we will get these 'recordtext' values one
line at a time, accumulating over time. The only way I can find to
make this work is to insert a new record for each line. The problem
is, that when someone wants to search multiple keywords, they expect
these words to be matched across multiple records with a given incident
number.

For a very simple example:

IncidentID Date Recordtext
-------------- -------------
-------------------------------------------------------
11111 2005-05-01 14:21 blah blah blah RED blah blah
2222 2005-05-01 14:23 not what we are looking for
11111 2005-05-02 02:05 blah CORVETTE blah blah

So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE
'%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the
condition will only be applied to a single row of recordtext at a time,
not a whole group with the same incident number.

If I were to use tsearch2 for full-text indexing, would I need to
create another table that merges all of my recordtext rows into a
single 'text' field type? If so, this is where I run into problems, as
my logic also needs to match multiple words in their original order. I
may also receive additional updates to the previous data. In that
case, I need to replace the original record with the latest version of
it. If I have already concatenated these rows into a single field, the
logic to in-line replace only the old text that has changed is very
very difficult at best. So, that's the reason I had to do two
subqueries in my example. Please tell me if I misunderstood your logic
and it really will match given my condition above, but it didn't seem
like it would.

Thanks again for the quick responses! This list has been a great
resource for me.

-Dan

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2005-05-09 02:06:47 Re: Query tuning help
Previous Message Tom Lane 2005-05-09 00:58:22 Re: Query tuning help