Re: horrendous query challenge :-)

From: Shaun Thomas <sthomas(at)townnews(dot)com>
To: Fran Fabrizio <ffabrizio(at)mmrd(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: horrendous query challenge :-)
Date: 2002-05-30 14:05:50
Message-ID: Pine.LNX.4.44.0205300815220.1526-100000@hamster.lee.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 29 May 2002, Fran Fabrizio wrote:

> cs.entity_id,e.type,e.name,w.interface_label,cs.remote_ts,s.name,r.name
> from current_status_test cs, entity e, site s, region r,watch w where
> cs.entity_id in (select entity_id from entity where active=true) and
> cs.current = false and cs.msgtype=w.msg_type and e.entity_id =
> cs.entity_id and s.region_id = r.region_id and cs.site_id = s.site_id
> order by cs.remote_ts desc, r.name, s.name;

Um... eww? Let's make that a *little* more readable, shall we?

SELECT cs.entity_id, e.type, e.name, w.interface_label,
cs.remote_ts, s.name, r.name
FROM current_status_test cs, entity e, site s,
region r, watch w
WHERE cs.entity_id in (select entity_id from entity where active=true)
AND cs.current = false
AND cs.msgtype=w.msg_type
AND e.entity_id = cs.entity_id
AND s.region_id = r.region_id
AND cs.site_id = s.site_id
ORDER BY cs.remote_ts desc, r.name, s.name;

Ok, now that we have that taken care of, first suggestion is to ditch
the "in" clause. Postgres is notorious about not doing those very well.
Looking at the query, you don't need it, it can be part of the where
clause directly. You can replace it with "e.active = true".

Since you're already including the entity table, why not use it? Now
you have this:

SELECT cs.entity_id, e.type, e.name, w.interface_label,
cs.remote_ts, s.name, r.name
FROM current_status_test cs, entity e, site s,
region r, watch w
WHERE cs.current = false
AND cs.msgtype = w.msg_type
AND cs.entity_id = e.entity_id
AND e.active = true
AND cs.site_id = s.site_id
AND s.region_id = r.region_id
ORDER BY cs.remote_ts desc, r.name, s.name;

Not much different, but I bet it'll work better. As far as indexes go,
I'd follow the obvious route and index all of your "_id" columns, if
they're not already. I'd suggest putting one on msg_type too as it
also appears in your view.

But wait, you have a view in there... let's rewrite that too. Since
you're using and, all of the where parenthesis can go bye-bye. You also
don't need to date(now()) as now() is already a date.

SELECT findsite(e.entity_id) AS site_id, e.entity_id,
get_status(e.entity_id, e.watch_id) AS status,
e.watch_id, e.msg_type AS msgtype, cat.name AS msgcat,
'Ok' AS message, now() AS local_ts,
e.last_contact AS remote_ts, e.current
FROM entity_watch_map e, classifications class, categories cat
WHERE e.last_contact > "timestamp"(now() - 180)
AND e.msg_type = class.msg_type
AND class.category_id = cat.category_id;

Yow. That's some view. You're essentially joining 7 tables overall,
not 5. That can't be pretty. ^_^ Anyway, I'd index msg_type, and
category_id from this select. Not much you can do with anything else.

But also consider that you're simply adding information from the
classifications and categories tables. You use none of this information
in your query at all. Why even use the view? The view restricts to
things newwer than 180 days, but you can put that in your select itself.
You're left with:

SELECT wm.entity_id, e.type, e.name, w.interface_label,
wm.last_contact AS remote_ts, s.name, r.name
FROM entity_watch_map wm, entity e, site s,
region r, watch w
WHERE wm.last_contact > "timestamp"(now() - 180)
AND wm.current = false
AND wm.msg_type = w.msg_type
AND wm.entity_id = e.entity_id
AND e.active = true
AND findsite(wm.entity_id) = s.site_id
AND s.region_id = r.region_id
ORDER BY wm.last_contact desc, r.name, s.name;

And there you go. I just essentially chopped two tables off of your
total query plan. Tables you don't need, and tables you don't use data
from in this particular query, if my guess is right.

So, all in all, you've lost an unnecessary IN clause, and chopped off
two unnecessary table joins. Index the *_id columns and msg_type,
vacuum analyze, and try running this again.

That's about all I can see. But it should be more than enough.

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas INN Database Administrator |
| Phone: (309) 743-0812 Fax : (309) 743-0830 |
| Email: sthomas(at)townnews(dot)com AIM : trifthen |
| Web : www.townnews.com |
| |
| "Most of our lives are about proving something, either to |
| ourselves or to someone else." |
| -- Anonymous |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joshua b. Jore 2002-05-30 14:14:49 Re: Checksum/Hash Functions
Previous Message Joel Burton 2002-05-30 14:01:25 Re: Checksum/Hash Functions