Re: Major performance problem after upgrade from 8.3 to 8.4

From: Gerhard Wiesinger <lists(at)wiesinger(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Andreas Kretschmer <akretschmer(at)spamfence(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Major performance problem after upgrade from 8.3 to 8.4
Date: 2010-09-04 06:58:30
Message-ID: alpine.LFD.2.01.1009040850110.26391@bbs.intern
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 3 Sep 2010, Tom Lane wrote:

> Gerhard Wiesinger <lists(at)wiesinger(dot)com> writes:
>> 8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt
>> 8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt
>
> Hmm. The 8.3 plan is indeed assuming that the number of rows will stay
> constant as we bubble up through the join levels, but AFAICS this is
> simply wrong:
>
> -> Nested Loop Left Join (cost=0.00..38028.89 rows=67 width=8)
> -> Nested Loop Left Join (cost=0.00..25399.46 rows=67 width=8)
> -> Nested Loop Left Join (cost=0.00..12770.04 rows=67 width=8)
> -> Index Scan using i_log_unique on log l (cost=0.00..140.61 rows=67 width=8)
> Index Cond: (datetime >= (now() - '00:01:00'::interval))
> -> Index Scan using unique_key_and_id on log_details d7 (cost=0.00..187.39 rows=89 width=8)
> Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
> -> Index Scan using unique_key_and_id on log_details d6 (cost=0.00..187.39 rows=89 width=8)
> Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
> -> Index Scan using unique_key_and_id on log_details d5 (cost=0.00..187.39 rows=89 width=8)
> Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))
>
> If the log_details indexscans are expected to produce 89 rows per
> execution, then surely the join size should go up 89x at each level,
> because the join steps themselves don't eliminate anything.
>
> In 8.4 the arithmetic is at least self-consistent:
>
> -> Nested Loop Left Join (cost=0.00..505256.95 rows=57630 width=8)
> -> Nested Loop Left Join (cost=0.00..294671.96 rows=6059 width=8)
> -> Nested Loop Left Join (cost=0.00..272532.55 rows=637 width=8)
> -> Index Scan using log_pkey on log l (cost=0.00..270203.92 rows=67 width=8)
> Filter: (datetime >= (now() - '00:01:00'::interval))
> -> Index Scan using unique_key_and_id on log_details d7 (cost=0.00..34.63 rows=10 width=8)
> Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6))
> -> Index Scan using unique_key_and_id on log_details d6 (cost=0.00..34.63 rows=10 width=8)
> Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5))
> -> Index Scan using unique_key_and_id on log_details d5 (cost=0.00..34.63 rows=10 width=8)
> Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4))
>
> The rowcount estimates are apparently a shade less than 10, but they get
> rounded off in the display.
>
> I believe the reason for this change is that 8.4's join estimation code
> was rewritten so that it wasn't completely bogus for outer joins. 8.3
> might have been getting the right answer, but it was for the wrong
> reasons.
>
> So the real question to be answered here is why doesn't it think that
> each of the unique_key_and_id indexscans produce just a single row, as
> you indicated was the case. The 8.4 estimate is already a factor of
> almost 10 closer to reality than 8.3's, but you need another factor of
> 10. You might find that increasing the statistics target for the
> log_details table helps.

Ok, Tom, tried different things (more details are below):
1.) Setting statistic target to 1000 and 10000 (without success), still
merge join
2.) Tried to added a Index on description to help the planner for
uniqueness (without success)
3.) Forced the planner to use nested loop joins (SUCCESS):
SET enable_hashjoin=false;SET enable_mergejoin=false;
(BTW: How do use such settings in Java and PHP and Perl, is there a
command available?)

Open questions:
Why does the planner not choose nested loop joins, that should be the
optimal one for that situation?
Does the planner value: a.) UNIQUENESS b.) UNIQUENESS and NOT NULLs?
Any ideas for improvement of the planner?

Details:
-- CREATE UNIQUE INDEX unique_key_and_id ON log_details USING btree (fk_id, fk_keyid)
-- 1000 and 10000 didn't help for better query plan for Nested Loop Left Join, still Merge Left Join
-- Sample with:
-- ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 10000;
-- ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 10000;
-- ANALYZE VERBOSE log_details;
-- Still Merge Join:
-- -> Merge Left Join (cost=9102353.88..83786934.25 rows=2726186787 width=16)
-- Merge Cond: (l.id = d2000902.fk_id)
-- -> Merge Left Join (cost=8926835.18..40288402.09 rows=972687282 width=24)
-- Merge Cond: (l.id = d2000904.fk_id)
-- Default values again
ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 100;
ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 100;
ANALYZE VERBOSE log_details;

-- Tried to add WITHOUT SUCCESS (that planner could know that description is NOT NULL and UNIQE)
DROP INDEX IF EXISTS i_key_description_desc;
CREATE UNIQUE INDEX i_key_description_desc ON key_description (description);
-- Therefore planner should know: keyid is NOT NULL and UNIQUE and only one result: (SELECT keyid FROM key_description WHERE description = 'Raumsolltemperatur')
-- Therefore from constraint planner should know that fk_id is NOT NULL and UNIQUE: CONSTRAINT unique_key_and_id UNIQUE(fk_id, fk_keyid):
-- LEFT JOIN log_details d1 ON l.id = d1.fk_id AND
-- d1.fk_keyid = (SELECT keyid FROM key_description WHERE description = 'Raumsolltemperatur')
-- Does the planner value alls those UNIQUEnesses and NOT NULLs?

-- Again back to 8.3 query plan which is fast (319ms):
SET enable_hashjoin=false;
SET enable_mergejoin=false;
-- -> Nested Loop Left Join (cost=0.00..22820970510.45 rows=2727492136 width=16)
-- -> Nested Loop Left Join (cost=0.00..12810087616.29 rows=973121653 width=24)
-- -> Nested Loop Left Join (cost=0.00..9238379092.22 rows=347192844 width=24)

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kaloyan Iliev Iliev 2010-09-07 07:19:16 Re: Question about LEFT JOIN and query plan
Previous Message Tom Lane 2010-09-03 21:10:50 Re: Major performance problem after upgrade from 8.3 to 8.4