Re: multi-layered view join performance oddities

From: Svenne Krap <svenne(at)krap(dot)dk>
To: "Steinar H(dot) Gunderson" <sgunderson(at)bigfoot(dot)com>
Cc: PgSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: multi-layered view join performance oddities
Date: 2005-10-30 18:33:03
Message-ID: 436511DF.5020800@krap.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi.

Your suggestion with disableing the nested loop really worked well:

rkr=# set enable_nestloop=false;
SET
rkr=# explain analyze select * from ord_result_pct_pretty ;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=230.06..337.49 rows=1 width=174) (actual
time=21.893..42.356 rows=2250 loops=1)
Hash Cond: (("outer".dataset_id = "inner".dataset_id) AND
("outer".nb_property_type_id = "inner".nb_property_type_id))
-> Hash Join (cost=56.94..164.10 rows=26 width=93) (actual
time=5.073..17.906 rows=2532 loops=1)
Hash Cond: ("outer".dataset_id = "inner".id)
-> Hash Join (cost=55.54..161.63 rows=161 width=57) (actual
time=4.996..14.775 rows=2532 loops=1)
Hash Cond: ("outer".institut = "inner".id)
-> Append (cost=54.38..121.72 rows=2476 width=44)
(actual time=4.964..11.827 rows=2532 loops=1)
-> HashAggregate (cost=54.38..57.20 rows=226
width=16) (actual time=4.964..5.174 rows=282 loops=1)
-> Seq Scan on ord_entrydata_current
(cost=0.00..37.50 rows=2250 width=16) (actual time=0.002..1.305
rows=2250 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..60.00
rows=2250 width=20) (actual time=0.009..4.948 rows=2250 loops=1)
-> Seq Scan on ord_entrydata_current
(cost=0.00..37.50 rows=2250 width=20) (actual time=0.003..2.098
rows=2250 loops=1)
-> Hash (cost=1.13..1.13 rows=13 width=17) (actual
time=0.022..0.022 rows=13 loops=1)
-> Seq Scan on groups g (cost=0.00..1.13 rows=13
width=17) (actual time=0.003..0.013 rows=13 loops=1)
-> Hash (cost=1.32..1.32 rows=32 width=36) (actual
time=0.070..0.070 rows=32 loops=1)
-> Seq Scan on ord_dataset od (cost=0.00..1.32 rows=32
width=36) (actual time=0.009..0.043 rows=32 loops=1)
Filter: is_visible
-> Hash (cost=173.07..173.07 rows=10 width=97) (actual
time=15.472..15.472 rows=256 loops=1)
-> Hash Join (cost=166.15..173.07 rows=10 width=97) (actual
time=14.666..15.203 rows=256 loops=1)
Hash Cond: ("outer".nb_property_type_id = "inner".id)
-> HashAggregate (cost=165.05..168.15 rows=248
width=40) (actual time=14.619..14.849 rows=288 loops=1)
-> Append (cost=54.38..121.72 rows=2476 width=44)
(actual time=5.012..11.130 rows=2532 loops=1)
-> HashAggregate (cost=54.38..57.20
rows=226 width=16) (actual time=5.011..5.222 rows=282 loops=1)
-> Seq Scan on ord_entrydata_current
(cost=0.00..37.50 rows=2250 width=16) (actual time=0.001..1.261
rows=2250 loops=1)
-> Subquery Scan "*SELECT* 2"
(cost=0.00..60.00 rows=2250 width=20) (actual time=0.010..4.308
rows=2250 loops=1)
-> Seq Scan on ord_entrydata_current
(cost=0.00..37.50 rows=2250 width=20) (actual time=0.002..1.694
rows=2250 loops=1)
-> Hash (cost=1.08..1.08 rows=8 width=57) (actual
time=0.026..0.026 rows=8 loops=1)
-> Seq Scan on nb_property_type npt
(cost=0.00..1.08 rows=8 width=57) (actual time=0.004..0.019 rows=8 loops=1)
Total runtime: 43.297 ms
(28 rows)

Now, the whole question becomes, how do I get the planner to make a
better estimation of the returned rows.

I am not sure, I can follow your moving-the-union-all-further-out
advice, as I see no different place for the unioning of the two datasets.

Maybe one of the core devs know, where to fiddle :)

Svenne

Steinar H. Gunderson wrote:

>On Sun, Oct 30, 2005 at 06:16:04PM +0100, Svenne Krap wrote:
>
>
>> Nested Loop (cost=223.09..338.61 rows=1 width=174) (actual time=20.213..721.361 rows=2250 loops=1)
>> Join Filter: (("outer".dataset_id = "inner".dataset_id) AND ("outer".nb_property_type_id = "inner".nb_property_type_id))
>> -> Hash Join (cost=58.04..164.26 rows=1 width=150) (actual time=5.510..22.088 rows=2250 loops=1)
>>
>>
>
>There's horrible misestimation here. It expects one row and thus starts a
>nested loop, but gets 2250. No wonder it's slow :-)
>
>The misestimation can be traced all the way down here:
>
>
>
>> Hash Cond: ("outer".institut = "inner".id)
>> -> Hash Join (cost=56.88..163.00 rows=16 width=137) (actual time=5.473..19.165 rows=2250 loops=1)
>> Hash Cond: ("outer".dataset_id = "inner".id)
>> -> Hash Join (cost=55.48..160.95 rows=99 width=101) (actual time=5.412..16.264 rows=2250 loops=1)
>>
>>
>
>where the planner misestimates the selectivity of your join (it estimates 99
>rows, and there are 2250).
>
>I've had problems joining with Append nodes in the past, and solved the
>problem by moving the UNION ALL a bit out, but I'm not sure if it's a very
>good general solution, or a solution to your problems here.
>
>If all else fails, you could "set enable_nestloop=false", but that is not a
>good idea in the long run, I'd guess -- it's much better to make sure the
>planner has good estimates and let it do the correct decisions from there.
>
>/* Steinar */
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Svenne Krap 2005-10-30 18:49:10 Re: multi-layered view join performance oddities
Previous Message Tom Lane 2005-10-30 18:27:01 Re: multi-layered view join performance oddities