Re: planner favors seq scan too early

From: Richard Huxton <dev(at)archonet(dot)com>
To: Markus Bertheau <mbertheau(dot)pg(at)googlemail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: planner favors seq scan too early
Date: 2008-02-21 08:34:09
Message-ID: 47BD3781.6000503@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Markus Bertheau wrote:
>
> I'm getting a plan that uses a sequential scan on ext_feeder_item instead of
> several index scans, which slows down the query significantly:
>
> # explain analyze select fi.pub_date from ext_feeder_item fi where fi.feed_id
> in (select id from ext_feeder_feed ff where ff.is_system) order by pub_date
> desc;
> Sort (cost=298545.70..299196.46 rows=260303 width=8) (actual
> time=89299.623..89302.146 rows=807 loops=1)

> Using LIMIT in the subquery I can see that starting with 50 values for the in
> the planner starts to prefer the seq scan. Plan for 49:

> Sort (cost=277689.24..277918.39 rows=91660 width=8) (actual
> time=477.769..478.193 rows=137 loops=1)

> Note that the rows estimate for the index scan is way off. Increasing
> statistics target for ext_feeder_item.feed_id to 100 lets the planner favor the
> index scan up to LIMIT 150 for the subquery.
>
> Using enable_seqscan=false, I see that the index scan plan continues to
> outperform the seqscan plan even with limit 1500 in the subquery (1196 values
> actually returned from it):

> Sort (cost=100925142.27..100925986.74 rows=337787 width=8) (actual
> time=102.111..104.627 rows=807 loops=1)

> Why does the planner choose that way and what can I do to make it choose the
> better plan, preferably without specifying limit and a maybe unreasonably high
> statistics target for ext_feeder_item.feed_id?

Although the index scans are fast enough, the cost estimate is much more.

This suggests you need to tweak your planner cost settings:
http://www.postgresql.org/docs/8.3/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

I'd probably start with reducing random_page_cost if you have a
reasonable disk system and making sure effective_cache_size is
accurately set.

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Cramer 2008-02-21 12:24:01 Re: Question about shared_buffers and cpu usage
Previous Message Scott Marlowe 2008-02-21 07:15:00 Re: Question about shared_buffers and cpu usage