Skip site navigation (1) Skip section navigation (2)

Re: Major performance problem after upgrade from 8.3 to 8.4

From: Gerhard Wiesinger <lists(at)wiesinger(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Andreas Kretschmer <akretschmer(at)spamfence(dot)net>
Subject: Re: Major performance problem after upgrade from 8.3 to 8.4
Date: 2010-09-13 06:39:44
Message-ID: alpine.LFD.2.01.1009130838530.12288@bbs.intern (view raw or flat)
Thread:
Lists: pgsql-performance
Hello,

Any news or ideas regarding this issue?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Sat, 4 Sep 2010, Gerhard Wiesinger wrote:

> On Fri, 3 Sep 2010, Tom Lane wrote:
>
>> Gerhard Wiesinger <lists(at)wiesinger(dot)com> writes:
>>> 8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt
>>> 8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt
>> 
>> Hmm.  The 8.3 plan is indeed assuming that the number of rows will stay
>> constant as we bubble up through the join levels, but AFAICS this is
>> simply wrong:
>>
>>          ->  Nested Loop Left Join  (cost=0.00..38028.89 rows=67 width=8)
>>            ->  Nested Loop Left Join  (cost=0.00..25399.46 rows=67 width=8)
>>              ->  Nested Loop Left Join  (cost=0.00..12770.04 rows=67 
>> width=8)
>>                ->  Index Scan using i_log_unique on log l 
>> (cost=0.00..140.61 rows=67 width=8)
>>                    Index Cond: (datetime >= (now() - '00:01:00'::interval))
>>                ->  Index Scan using unique_key_and_id on log_details d7 
>> (cost=0.00..187.39 rows=89 width=8)
>>                    Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
>>              ->  Index Scan using unique_key_and_id on log_details d6 
>> (cost=0.00..187.39 rows=89 width=8)
>>                  Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
>>            ->  Index Scan using unique_key_and_id on log_details d5 
>> (cost=0.00..187.39 rows=89 width=8)
>>                Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))
>> 
>> If the log_details indexscans are expected to produce 89 rows per
>> execution, then surely the join size should go up 89x at each level,
>> because the join steps themselves don't eliminate anything.
>> 
>> In 8.4 the arithmetic is at least self-consistent:
>>
>>          ->  Nested Loop Left Join  (cost=0.00..505256.95 rows=57630 
>> width=8)
>>            ->  Nested Loop Left Join  (cost=0.00..294671.96 rows=6059 
>> width=8)
>>              ->  Nested Loop Left Join  (cost=0.00..272532.55 rows=637 
>> width=8)
>>                ->  Index Scan using log_pkey on log l 
>> (cost=0.00..270203.92 rows=67 width=8)
>>                    Filter: (datetime >= (now() - '00:01:00'::interval))
>>                ->  Index Scan using unique_key_and_id on log_details d7 
>> (cost=0.00..34.63 rows=10 width=8)
>>                    Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
>>              ->  Index Scan using unique_key_and_id on log_details d6 
>> (cost=0.00..34.63 rows=10 width=8)
>>                  Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
>>            ->  Index Scan using unique_key_and_id on log_details d5 
>> (cost=0.00..34.63 rows=10 width=8)
>>                Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))
>> 
>> The rowcount estimates are apparently a shade less than 10, but they get
>> rounded off in the display.
>> 
>> I believe the reason for this change is that 8.4's join estimation code
>> was rewritten so that it wasn't completely bogus for outer joins.  8.3
>> might have been getting the right answer, but it was for the wrong
>> reasons.
>> 
>> So the real question to be answered here is why doesn't it think that
>> each of the unique_key_and_id indexscans produce just a single row, as
>> you indicated was the case.  The 8.4 estimate is already a factor of
>> almost 10 closer to reality than 8.3's, but you need another factor of
>> 10.  You might find that increasing the statistics target for the
>> log_details table helps.
>
>
> Ok, Tom, tried different things (more details are below):
> 1.) Setting statistic target to 1000 and 10000 (without success), still merge 
> join
> 2.) Tried to added a Index on description to help the planner for uniqueness 
> (without success)
> 3.) Forced the planner to use nested loop joins (SUCCESS):
> SET enable_hashjoin=false;SET enable_mergejoin=false;
> (BTW: How do use such settings in Java and PHP and Perl, is there a command 
> available?)
>
> Open questions:
> Why does the planner not choose nested loop joins, that should be the optimal 
> one for that situation?
> Does the planner value: a.) UNIQUENESS b.) UNIQUENESS and NOT NULLs?
> Any ideas for improvement of the planner?
>
> Details:
> -- CREATE UNIQUE INDEX unique_key_and_id ON log_details USING btree (fk_id, 
> fk_keyid)
> -- 1000 and 10000 didn't help for better query plan for Nested Loop Left 
> Join, still Merge Left Join
> -- Sample with:
> -- ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 10000;
> -- ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 10000;
> -- ANALYZE VERBOSE log_details;
> -- Still Merge Join:
> --   ->  Merge Left Join  (cost=9102353.88..83786934.25 rows=2726186787 
> width=16)
> --         Merge Cond: (l.id = d2000902.fk_id)
> --         ->  Merge Left Join  (cost=8926835.18..40288402.09 rows=972687282 
> width=24)
> --               Merge Cond: (l.id = d2000904.fk_id)
> -- Default values again
> ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 100;
> ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 100;
> ANALYZE VERBOSE log_details;
>
> -- Tried to add WITHOUT SUCCESS (that planner could know that description is 
> NOT NULL and UNIQE)
> DROP INDEX IF EXISTS i_key_description_desc;
> CREATE UNIQUE INDEX i_key_description_desc ON key_description (description);
> -- Therefore planner should know: keyid is NOT NULL and UNIQUE and only one 
> result: (SELECT keyid FROM key_description WHERE description = 
> 'Raumsolltemperatur')
> -- Therefore from constraint planner should know that fk_id is NOT NULL and 
> UNIQUE: CONSTRAINT unique_key_and_id UNIQUE(fk_id, fk_keyid):
> -- LEFT JOIN log_details d1 ON l.id = d1.fk_id AND
> -- d1.fk_keyid = (SELECT keyid FROM key_description WHERE description = 
> 'Raumsolltemperatur')
> -- Does the planner value alls those UNIQUEnesses and NOT NULLs?
>
> -- Again back to 8.3 query plan which is fast (319ms):
> SET enable_hashjoin=false;
> SET enable_mergejoin=false;
> --   ->  Nested Loop Left Join  (cost=0.00..22820970510.45 rows=2727492136 
> width=16)
> --         ->  Nested Loop Left Join  (cost=0.00..12810087616.29 
> rows=973121653 width=24)
> --               ->  Nested Loop Left Join  (cost=0.00..9238379092.22 
> rows=347192844 width=24)
>
> Thnx.
>
> Ciao,
> Gerhard
>
> --
> http://www.wiesinger.com/
>
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

pgsql-performance by date

Next:From: Gaetano MendolaDate: 2010-09-13 09:47:24
Subject: Useless sort by
Previous:From: azajacDate: 2010-09-13 05:49:45
Subject: Problem with mergejoin performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group