Re: -HEAD planner issue wrt hash_joins on dbt3 ?

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: -HEAD planner issue wrt hash_joins on dbt3 ?
Date: 2006-09-17 10:52:55
Message-ID: 450D2907.3070307@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stefan Kaltenbrunner wrote:
> [already sent a variant of that yesterday but it doesn't look like it
> made it to the list]
>
> Tom Lane wrote:
>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>> Tom Lane wrote:
>>>> Apparently we've made the planner a bit too optimistic about the savings
>>>> that can be expected from repeated indexscans occurring on the inside of
>>>> a join.
>>> effective_cache_size was set to 10GB(my fault for copying over the conf
>>> from a 16GB box) during the run - lowering it just a few megabytes(!) or
>>> to a more realistic 6GB results in the following MUCH better plan:
>>> http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt
>> Interesting. It used to be that effective_cache_size wasn't all that
>> critical... what I think this report is showing is that with the 8.2
>> changes to try to account for caching effects in repeated indexscans,
>> we've turned that into a pretty significant parameter.
>
> took me a while due to hardware issues on my testbox - but there are new
> results(with 6GB for effective_cache_size) up at:
>
> http://www.kaltenbrunner.cc/files/5/
>
> there are still a few issues with the validity of the run like the rf
> tests not actually being done right - but lowering effective_cache_size
> gave a dramtic speedup on Q5,Q7 and Q8.
>
> that is the explain for the 4h+ Q9:
>
> http://www.kaltenbrunner.cc/files/analyze_q9.txt
>
> increasing the the statistic_target up to 1000 does not seem to change
> the plan btw.
>
> disabling nested loop leads to the following (4 times faster) plan:
>
> http://www.kaltenbrunner.cc/files/analyze_q9_no_nest.txt
>
> since the hash-joins in there look rather slow (inappropriate hashtable
> set up due to the wrong estimates?) I disabled hash_joins too:
>
> http://www.kaltenbrunner.cc/files/analyze_q9_no_nest_no_hashjoin.txt
>
> and amazingly this plan is by far the fastest one in runtime (15min vs
> 4,5h ...) except that the planner thinks it is 20 times more expensive ...

some additional numbers(first one is with default settings, second is
with enable_nestloop = 'off', third one is with enable_nestloop = 'off'
and enable_hashjoin='off'):

http://www.kaltenbrunner.cc/files/analyze_q7.txt

here we have a 3x speedup with disabling nested loops and a 2x speedup
(over the original plan) with nested loops and hashjoins disabled.

http://www.kaltenbrunner.cc/files/analyze_q20.txt

here we have a 180x(!) speedup with both disabled planner options ...

it is worth mentioning that for both queries the estimated costs in
relation to each other looks quite reasonable as soon as enable_nestloop
= 'off' (ie 5042928 vs 10715247 with 344sec vs 514 for Q7 and 101441851
vs 101445468 with 10sec vs 11sec)

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2006-09-17 13:09:29 Re: [HACKERS] Developer's Wiki
Previous Message Gregory Stark 2006-09-17 10:17:37 Re: Reducing data type space usage