Re: Optimizer wrongly picks Nested Loop Left Join

From: John A Meinel <john(at)arbash-meinel(dot)com>
To: Edin Kadribasic <edink(at)emini(dot)dk>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimizer wrongly picks Nested Loop Left Join
Date: 2005-05-11 14:04:47
Message-ID: 428210FF.4090302@arbash-meinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Edin Kadribasic wrote:
> Hi,
>
>
> I have a query that is giving the optimizer (and me) great headache. When
> its in the good mood the optimizer chooses Hash Left Join and the query
> executes in 13ms or so, but sometimes (more and more often) it chooses
> Nested Loop Left Join and the execution time goes up to 2-30sec.
>
> The query:
> SELECT COUNT(DISTINCT a.tid) FROM axp_temp_order_match a LEFT OUTER JOIN (
> SELECT ol.tid, ds.orid FROM axp_dayschedule ds JOIN axp_order_line ol ON
> ol.olid = ds.olid JOIN axp_order o ON ds.orid = o.orid WHERE o.status >= 100
> AND ds.day between '2005-05-12' and '2005-05-12' AND ds.used = '1' ) b ON
> (a.tid = b.tid) WHERE b.tid IS NULL AND a.sid = 16072;
>

Unfortunately, because Hash Join doesn't report the number of rows
(rows=0 always), it's hard to tell how good the estimator is. But I
*can* say that the NestLoop estimation is way off.

>
> Good plan:
> =========
> Aggregate (cost=221.93..221.93 rows=1 width=4) (actual time=34.262..34.266
> rows=1 loops=1)
> -> Hash Left Join (cost=9.07..220.86 rows=426 width=4) (actual
> time=34.237..34.237 rows=0 loops=1)
> Hash Cond: ("outer".tid = "inner".tid)
> Filter: ("inner".tid IS NULL)
> -> Index Scan using axp_temp_order_match_idx1 on
> axp_temp_order_match a (cost=0.00..209.65 rows=426 width=4) (actual
> time=0.277..0.512 rows=6 loops=1)
> Index Cond: (sid = 16072)
> -> Hash (cost=9.07..9.07 rows=1 width=4) (actual
> time=32.777..32.777 rows=0 loops=1)
> -> Nested Loop (cost=0.00..9.07 rows=1 width=4) (actual
> time=0.208..31.563 rows=284 loops=1)
> -> Nested Loop (cost=0.00..6.05 rows=1 width=4)
> (actual time=0.178..20.684 rows=552 loops=1)
> -> Index Scan using axp_dayschedule_day_idx on
> axp_dayschedule ds (cost=0.00..3.02 rows=1 width=8) (actual
> time=0.036..3.973 rows=610 loops=1)
> Index Cond: (("day" >= '2005-05-12'::date)
> AND ("day" <= '2005-05-12'::date))
> Filter: (used = B'1'::"bit")
> -> Index Scan using axp_order_orid_key on
> axp_order o (cost=0.00..3.02 rows=1 width=4) (actual time=0.009..0.013
> rows=1 loops=610)
> Index Cond: ("outer".orid = o.orid)
> Filter: (status >= 100)
> -> Index Scan using axp_order_line_pk on
> axp_order_line ol (cost=0.00..3.01 rows=1 width=8) (actual
> time=0.006..0.008 rows=1 loops=552)
> Index Cond: (ol.olid = "outer".olid)
> Total runtime: 34.581 ms
>
> Bad plan (same query different values):
> =======================================
> Aggregate (cost=11.54..11.54 rows=1 width=4) (actual
> time=11969.281..11969.285 rows=1 loops=1)
> -> Nested Loop Left Join (cost=0.00..11.53 rows=1 width=4) (actual
> time=25.730..11967.180 rows=338 loops=1)

See here, it thinks it will only have to do 1 nestloop, which would be
quite fast, but it hast to do 338.

> Join Filter: ("outer".tid = "inner".tid)
> Filter: ("inner".tid IS NULL)
> -> Index Scan using axp_temp_order_match_idx1 on
> axp_temp_order_match a (cost=0.00..2.45 rows=1 width=4) (actual
> time=0.027..2.980 rows=471 loops=1)
> Index Cond: (sid = 16092)
> -> Nested Loop (cost=0.00..9.07 rows=1 width=4) (actual
> time=0.088..24.350 rows=285 loops=471)

Same thing here.

> -> Nested Loop (cost=0.00..6.04 rows=1 width=8) (actual
> time=0.067..15.649 rows=317 loops=471)

And here.

> -> Index Scan using axp_dayschedule_day_idx on
> axp_dayschedule ds (cost=0.00..3.02 rows=1 width=8) (actual
> time=0.015..3.557 rows=606 loops=471)

This estimate is way off too, but it is off in both plans.

> Index Cond: (("day" >= '2005-05-13'::date) AND
> ("day" <= '2005-05-13'::date))
> Filter: (used = B'1'::"bit")
> -> Index Scan using axp_order_line_pk on
> axp_order_line ol (cost=0.00..3.01 rows=1 width=8) (actual
> time=0.006..0.008 rows=1 loops=285426)

This is probably what is killing you. It is doing a single lookup 285k
times. The above plan only does it 552 times.
> Index Cond: (ol.olid = "outer".olid)
> -> Index Scan using axp_order_orid_key on axp_order o
> (cost=0.00..3.02 rows=1 width=4) (actual time=0.009..0.013 rows=1
> loops=149307)
> Index Cond: ("outer".orid = o.orid)
> Filter: (status >= 100)
> Total runtime: 11969.443 ms
>
> Please note that sometimes when I get "bad plan" in the logfile, I just
> re-run the query and the optimizer chooses the more efficient one. Sometime
> it does not.

You work_mem is quite high relative to your total Ram, hopefully you
don't have many allowed concurrent connections. But that is a side point.

I assume the tables are freshly VACUUM ANALYZEd. Have you tried altering
the statistics for the columns, one of them to look at is
axp_dayschedule(day). That one seems to be consistently incorrect.
Perhaps because a between with the same date is too restrictive in
postgres? I don't really know.

Also, looking at the original query, maybe I am missing something, but
you are doing a COUNT(DISTINCT) with a LEFT OUTER JOIN.

If it is a LEFT JOIN, isn't that the same as

SELECT COUNT(DISTINCT a.tid) FROM axp_temp_order_match a
WHERE a.sid = 16072;

I also have to wonder about:

SELECT ... a.tid FROM a LEFT JOIN (...) b ON (a.tid = b.tid)
WHERE b.tid IS NULL

Isn't that equivalent to

SELECT COUNT(DISTINCT a.tid) FROM a WHERE a.tid IS NULL;

Which is also equivalent to
SELECT CASE WHEN EXISTS (SELECT a.tid FROM a WHERE a.tid IS NULL AND
a.sid = 16072) THEN 1 ELSE 0 END;

I could be wrong, but if a.tid IS NULL (because b.tid IS NULL, and a.tid
= b.tid), DISTINCT can only return 0 or 1 rows, which is the same as
using a case statement. You could also put a LIMIT in there, since you
know DISTINCT can only return 1 row

SELECT COUNT(a.tid) FROM a WHERE a.tid IS NULL AND a.sid = ... LIMIT 1;

However, one final point, COUNT(column) where column is NULL doesn't
count anything. So really your query can be replaced with:

SELECT 0;

Now maybe this is a generated query, and under other circumstances it
gives a different query which actually is important.

The biggest thing is that using a COUNT() and a DISTINCT on a column
that is on the left side of a LEFT JOIN, sounds like you can get rid of
the entire right side of that join.

John
=:->

>
> Any ideas?
>
> postgresql-8.0.2 on 2x3.2 GHz Xeon with 2GB ram Linux 2.6
> shared_buffers = 15000
> work_mem = 128000
> effective_cache_size = 200000
> random_page_cost = (tried 1.0 - 4, seemingly without effect on this
> particular issue).
>
> Edin
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Neil Conway 2005-05-11 14:14:50 Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL
Previous Message Greg Stark 2005-05-11 13:59:05 Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL