Re: index scan of whole table, can't see why

From: "Dan Langille" <dan(at)langille(dot)org>
To: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
Cc: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: index scan of whole table, can't see why
Date: 2005-01-21 00:55:20
Message-ID: 41F00CA8.8976.F79FE91@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 21 Jan 2005 at 8:38, Russell Smith wrote:

> On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote:
> > On 20 Jan 2005 at 7:26, Stephan Szabo wrote:
>
> [snip]
> > > Honestly I expected it to be slower (which it was), but I figured
> > > it's worth seeing what alternate plans it'll generate
> > > (specifically to see how it cost a nested loop on that join to
> > > compare to the fast plan). Unfortunately, it generated a merge
> > > join, so I think it might require both enable_hashjoin=false and
> > > enable_mergejoin=false to get it which is likely to be even slower
> > > in practice but still may be useful to see.
> >
> > Setting both to false gives a dramatic performance boost. See
> > http://rafb.net/paste/results/b70KAi42.html
> >
> -> Materialize (cost=15288.70..15316.36 rows=2766 width=35)
> (actual time=0.004..0.596 rows=135 loops=92)
> -> Nested Loop (cost=0.00..15288.70 rows=2766
> width=35) (actual time=0.060..9.130 rows=135 loops=1)
>
> The Planner here has a quite inaccurate guess at the number of rows
> that will match in the join. An alternative to turning off join types
> is to up the statistics on the Element columns because that's where
> the join is happening. Hopefully the planner will get a better idea.
> However it may not be able too. 2766 rows vs 135 is quite likely to
> choose different plans. As you can see you have had to turn off two
> join types to give something you wanted/expected.

Fair comment. However, the statistics on ports.element_id,
ports.deprecated, ports.broken, and element.id are both set to 1000.

> > This gives suitable speed, but why does the plan vary so much with
> > such a minor change in the WHERE clause?
> Plan 1 - broken
> -> Nested Loop (cost=0.00..3825.30 rows=495 width=35) (actual
> time=0.056..16.161 rows=218 loops=1)
>
> Plan 2 - deprecated
> -> Hash Join (cost=3676.78..10144.06 rows=2767 width=35)
> (actual time=7.638..1158.128 rows=135 loops=1)
>
> The performance difference is when the where is changed, you have a
> totally different set of selection options. The Plan 1 and Plan 2
> shown from your paste earlier, report that you are out by a factor of
> 2 for plan 1. But for plan 2 its a factor of 20. The planner is
> likely to make the wrong choice when the stats are out by that factor.
>
> Beware what is a small "typing" change does not mean they queries are
> anything alight.

Agreed. I just did not expect such a dramatic change which a result
set that is similar. Actually, they aren't that similar at all.

Thank you.
--
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Cramer 2005-01-21 01:04:19 Re: PostgreSQL clustering VS MySQL clustering
Previous Message Bruno Almeida do Lago 2005-01-21 00:40:02 Re: PostgreSQL clustering VS MySQL clustering