Re: How can this be optimized, if possible?

From: Greg Stark <gsstark(at)mit(dot)edu>
To: "Net Virtual Mailing Lists" <mailinglists(at)net-virtual(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: How can this be optimized, if possible?
Date: 2005-07-05 15:56:14
Message-ID: 87r7edtffl.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Net Virtual Mailing Lists" <mailinglists(at)net-virtual(dot)com> writes:

> The query I want to run against these two tables is something like this:
>
> SELECT
> count(*) as count,
> category.category,
> nlevel(category.category) AS level,
> subpath(category.category,0,nlevel(category.category)-1) as parent,
> category.head_title,
> category.cat_title,
> category.subcat_title
> FROM
> test,
> category
> WHERE
> test.category <@ category.category
> GROUP BY
> category.category, category.head_title, category.cat_title,
> category.subcat_title |
>
>
> Many times the "WHERE" clause will contain additional search criteria on
> the 'test' table. What I am trying to get is a count of how many rows
> from the test table fall into each category, being limited by the search
> criteria.

Post the output of EXPLAIN ANALYZE SELECT ...

Also list any indexes you have on the tables. Do you have a GIST index on
the ltree column in test?

There are a number of ways of rewriting this query, you'll probably have some
success with one of them.

You could for example write it:

SELECT *,
(SELECT count(*)
FROM test
WHERE category <@ category.category
[AND search criteria...]) as count
FROM category

Normally I would say your form with the join gives the planner the maximum
flexibility, but I don't think the planner is going to be able to do any
better than nested loops with a join clause like that so I don't think this
will be any worse than the join. And it might have a better chance of using an
index on test.category.

But not that it's still got to do 300 scans of the test index. If each one
takes .5s then this query is still going to take 150s or so. But with a gist
index on the test.category column it may be more 10s total. It will depend
partly on how many categories you have that span a large number of records in
test. That is, how many "parent" categories you have.

--
greg

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Pratt 2005-07-05 16:01:41 Regex escape [ character and change text result into integer
Previous Message Bricklen Anderson 2005-07-05 14:30:29 Re: Statistics and Indexes