Re: index scan of whole table, can't see why

From: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
To: "Dan Langille" <dan(at)langille(dot)org>
Cc: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: index scan of whole table, can't see why
Date: 2005-01-20 21:38:19
Message-ID: 200501210838.19638.mr-russ@pws.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote:
> On 20 Jan 2005 at 7:26, Stephan Szabo wrote:

[snip]
> > Honestly I expected it to be slower (which it was), but I figured it's
> > worth seeing what alternate plans it'll generate (specifically to see how
> > it cost a nested loop on that join to compare to the fast plan).
> > Unfortunately, it generated a merge join, so I think it might require both
> > enable_hashjoin=false and enable_mergejoin=false to get it which is likely
> > to be even slower in practice but still may be useful to see.
>
> Setting both to false gives a dramatic performance boost. See
> http://rafb.net/paste/results/b70KAi42.html
>
-> Materialize (cost=15288.70..15316.36 rows=2766 width=35) (actual time=0.004..0.596 rows=135 loops=92)
-> Nested Loop (cost=0.00..15288.70 rows=2766 width=35) (actual time=0.060..9.130 rows=135 loops=1)

The Planner here has a quite inaccurate guess at the number of rows that will match in the join. An alternative to
turning off join types is to up the statistics on the Element columns because that's where the join is happening. Hopefully the planner will
get a better idea. However it may not be able too. 2766 rows vs 135 is quite likely to choose different plans. As you can
see you have had to turn off two join types to give something you wanted/expected.

> This gives suitable speed, but why does the plan vary so much with
> such a minor change in the WHERE clause?
Plan 1 - broken
-> Nested Loop (cost=0.00..3825.30 rows=495 width=35) (actual time=0.056..16.161 rows=218 loops=1)

Plan 2 - deprecated
-> Hash Join (cost=3676.78..10144.06 rows=2767 width=35) (actual time=7.638..1158.128 rows=135 loops=1)

The performance difference is when the where is changed, you have a totally different set of selection options.
The Plan 1 and Plan 2 shown from your paste earlier, report that you are out by a factor of 2 for plan 1. But for plan 2
its a factor of 20. The planner is likely to make the wrong choice when the stats are out by that factor.

Beware what is a small "typing" change does not mean they queries are anything alight.

Regards

Russell Smith.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matt Casters 2005-01-20 21:39:02 Re:
Previous Message Alex Turner 2005-01-20 21:35:12 Re: [SQL] OFFSET impact on Performance???