Re: Expensive where clause

From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Keith Worthington <keithw(at)narrowpathinc(dot)com>
Cc: PostgreSQL Novice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Expensive where clause
Date: 2005-02-19 15:11:56
Message-ID: 20050219063547.C73677@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Sat, 19 Feb 2005, Keith Worthington wrote:

> On Fri, 18 Feb 2005 21:55:29 -0800 (PST), Stephan Szabo wrote
> > On Fri, 18 Feb 2005, Keith Worthington wrote:
> >
> > > I have been working on a SQL statement that contains a WHERE
> > > clause of the form WHERE column1 > column2. The query runs
> > > pretty quickly (285ms) without the WHERE clause but slows to
> > > a relative crawl (5850ms) when it is included.
> > > Any suggestions on how to improve the performance would be
> > > greatly appreciated.
> >
> > Explain analyze output for the query with and without the clause
> > would probably be useful for analysis.
>
> Here is the explain analyze output with the WHERE
> clause commented out. This one actually ran slow. Usually it is only a few
> hundred ms without the WHERE clause.

That's probably just the instrumentation.

I'm not 100% sure why it's changing plans although I wonder if the costs
are just close enough that small changes are causing the plan change, but
I think it wouldn't pick a nested loop if it knew that it was grossly
underestimating the number of loops. It might be interesting to see how
the second query runs in explain analyze with enable_nestloop=off although
that'll likely make lower portions of the query more expensive.

One thing that jumps out at me is scans like this:
-> Index Scan using idx_tbl_item_item_type,
idx_tbl_item_item_type on tbl_item items (cost=0.00..84.16 rows=1
width=24)
(actual time=0.093..5.702 rows=566 loops=1)
Index Cond: (((item_type)::text =
'DIR'::text) OR ((item_type)::text = 'NET'::text))
Filter: ((NOT inactive) AND (item_class = 1))

This misestimation may be playing a part in why it thinks a nested loop is
a good plan. Is there a strong correlation between some item_types and
item_class or inactive? What does explain analyze on the following
queries show?

select * from tbl_item where (item_type='DIR OR item_type='NET');
select * from tbl_item where (item_type='DIR OR item_type='NET') AND NOT
inactive;
select * from tbl_item where (item_type='DIR OR item_type='NET') AND
item_class=1;
select * from tbl_item where (item_type='DIR OR item_type='NET') AND NOT
inactive AND item_class=1;

I'm not sure if you'd get any win from a partial index with WHERE NOT
inactive (or possibly both NOT inactive AND item_class=1 if you're almost
always limiting item_class to 1), but that might also be something to
check.

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message A Gilmore 2005-02-21 01:32:48 Concatenate rows
Previous Message Keith Worthington 2005-02-19 06:40:45 Re: Expensive where clause