Re: Help with nested loop left join performance

From: Richard Huxton <dev(at)archonet(dot)com>
To: George Woodring <george(dot)woodring(at)iglass(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Help with nested loop left join performance
Date: 2006-02-22 09:38:04
Message-ID: 43FC30FC.3080800@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

George Woodring wrote:
>
> explain analyze SELECT column1, column2, column3, column4, column5,
> column6, column7, column8 FROM (SELECT CASE status WHEN 0 THEN 0 WHEN 1
> THEN 1 ELSE -1 END AS column1, mac AS column2, account AS column3,
> number || ' ' || address AS column4, 'qmod' || '.' || 'dmod' AS column5,
> node AS column6, grid AS column7, boxtype AS column8, number, address
> FROM settop_billing LEFT OUTER JOIN (dhct JOIN dhct_davic USING(mac))
> USING (mac) WHERE region='GTown1E' AND node='1E012' ) AS foo;

Ach y fi! Let's format that a bit better, eh?

explain analyze
SELECT column1, column2, column3, column4, column5,column6, column7, column8
FROM (
SELECT
CASE status WHEN 0 THEN 0 WHEN 1 THEN 1 ELSE -1 END AS column1,
mac AS column2,
account AS column3,
number || ' ' || address AS column4,
'qmod' || '.' || 'dmod' AS column5,
node AS column6,
grid AS column7,
boxtype AS column8,
number,
address
FROM
settop_billing
LEFT OUTER JOIN
(dhct JOIN dhct_davic USING(mac))
USING
(mac)
WHERE
region='GTown1E' AND node='1E012'
) AS foo;

Now we can see what's happening. Well, looking at it laid out like that,
I'm suspcious of the (dhct JOIN dhct_davic) on the outside of an outer
join. Looking at your explain we do indeed have two sequential scans
over the tables in question - the big one being dhct...

> -> Append (cost=0.00..8881.11 rows=384912 width=8)
> (actual time=0.023..10914.365 rows=384900 loops=1)
> -> Seq Scan on dhct_davic (cost=0.00..0.00 rows=1
> width=8) (actual time=0.002..0.002 rows=0 loops=1)
> -> Seq Scan on probe_dhct_davic dhct_davic
> (cost=0.00..8881.11 rows=384911 width=8) (actual time=0.018..10505.255
> rows=384900 loops=1)
> -> Hash (cost=12154.13..12154.13 rows=410613 width=6)
> (actual time=923.433..923.433 rows=0 loops=1)
> -> Seq Scan on dhct (cost=0.00..12154.13
> rows=410613 width=6) (actual time=0.019..534.641 rows=409576 loops=1)

With 7.4 I seem to remember that explicit JOINs force the evaluation
order, but I'm not if even later versions will rewrite your query. It's
too early in the morning for me to figure out if it's safe in all cases.

Anyway, for your purposes, I'd say something more like:
FROM settop_billing LEFT JOIN dhct LEFT JOIN dhct_davic

That should let the planner do the joins in a more reasonable order.
--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Chethana, Rao (IE10) 2006-02-22 10:38:58
Previous Message Jim C. Nasby 2006-02-22 00:14:56 Re: How to optimize a JOIN with BETWEEN?