Re: Query tuning help

From: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
To: Dan Harris <fbsd(at)drivefaster(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query tuning help
Date: 2005-05-09 02:32:05
Message-ID: 200505091232.05495.mr-russ@pws.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 9 May 2005 11:49 am, Dan Harris wrote:
>
> On May 8, 2005, at 6:51 PM, Russell Smith wrote:
>
[snip]
> > select distinct em.incidentid, ea.recordtext as retdata, eg.long,
> > eg.lat
> > FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=
> > '2005-1-1 00:00'
> > AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%'
> > AND ea.recordtext like '%CORVETTE%')
> > JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like
> > '%RED%' or recordtext like '%CORVETTE%' );
> >
>
> I have run this, and while it is very fast, I'm concerned it's not
> doing what I need.
How fast is very fast?

> Here's the situation:
>
> Due to the format of the systems with which I integrate ( I have no
> control over these formats ), we will get these 'recordtext' values one
> line at a time, accumulating over time. The only way I can find to
> make this work is to insert a new record for each line. The problem
> is, that when someone wants to search multiple keywords, they expect
> these words to be matched across multiple records with a given incident
> number.
>
> For a very simple example:
>
> IncidentID Date Recordtext
> -------------- -------------
> -------------------------------------------------------
> 11111 2005-05-01 14:21 blah blah blah RED blah blah
> 2222 2005-05-01 14:23 not what we are looking for
> 11111 2005-05-02 02:05 blah CORVETTE blah blah
>
> So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE
> '%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the
> condition will only be applied to a single row of recordtext at a time,
> not a whole group with the same incident number.
>
> If I were to use tsearch2 for full-text indexing, would I need to
> create another table that merges all of my recordtext rows into a
> single 'text' field type? If so, this is where I run into problems, as
> my logic also needs to match multiple words in their original order. I
> may also receive additional updates to the previous data. In that
> case, I need to replace the original record with the latest version of
> it. If I have already concatenated these rows into a single field, the
> logic to in-line replace only the old text that has changed is very
> very difficult at best. So, that's the reason I had to do two
> subqueries in my example. Please tell me if I misunderstood your logic
> and it really will match given my condition above, but it didn't seem
> like it would.
>
> Thanks again for the quick responses! This list has been a great
> resource for me.
>
select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >= '2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND (ea.recordtext like '%RED%' OR ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid WHERE
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >= '2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid) AND
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >= '2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%'))
JOIN eg ON em.incidentid = eg.incidentid)

This may be more accurate. However I would cool it VERY NASTY. Josh's solutions may be better.
However much of the data should be in memory once the subplans are done, so it may be quite fast.
you may
> >

> -Dan
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dan Harris 2005-05-09 02:49:07 Re: Query tuning help
Previous Message Dan Harris 2005-05-09 02:31:38 Re: Query tuning help