Quick Links

Nested Loops vs. Hash Joins or Merge Joins

From:	Ketema Harris <ketema(at)gmail(dot)com>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Nested Loops vs. Hash Joins or Merge Joins
Date:	2006-05-11 12:57:48
Message-ID:	C088AD0C.42AA%ketema@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

I am attempting to learn more about the way Pg decides what operators to use
in its query planning and executions. I have moderately complicated table
layout, but it is mostly normalized I recently created a query:

select account.acct_name as "Customer Name", NULL as "Facility",
account.acct_number as "LDC Acct#", account.svc_address as "Service
Address",
account.svc_address as "Service Address", account.svc_city as "Service
City", account.svc_state as "Service State", account.svc_city as "Service
City",
account.svc_zip as "Service Zip", product.ldc_name as "LDC", NULL as "ESI
Rate", NULL as "LDC Rate",
account.billing_address as "Mailing Address1", account.billing_address_2 as
"Mailing Address2",
account.billing_city || ', ' || account.billing_state as "City, State",
account.billing_zip as "Zip", customer.first_name || ' ' ||
customer.last_name
as "Contact", customer.phone as "Phone", customer.class as "Customer Class",
NULL as "Tax Exempt", NULL as "Exempt%",
marketer_divisions.channel_partner_code as "Channel Partner", NULL as "AE",
NULL as "Annual Use MCF", account.rate as "Trigger Price",
marketer_divisions.channel_partner_fee as "Channel Partner Fee"
from naes.reconciliation
inner join naes.application
inner join naes.account
inner join naes.marketer_product
inner join naes.marketer_divisions
inner join naes.cities
on marketer_divisions.city_id = cities.city_id
on marketer_product.division_id = marketer_divisions.division_id
inner join naes.product
on marketer_product.ldc_id = product.ldc_id
on account.marketer_product_id =
marketer_product.marketer_product_id
inner join naes.customer
on account.customer_id = customer.customer_id
on account.app_id = application.app_id and account.acct_id =
application.acct_id
on reconciliation.app_id = application.app_id and
reconciliation.transferred_date is NULL;

The query runs fine I have no performance issues with it, but here are two
query plans for the above query, one with nested loops on, the other with
them off:

Nested Loops on:

Nested Loop (cost=3.33..11.37 rows=1 width=268) (actual time=2.166..2.982
rows=3 loops=1)
Join Filter: ("outer".city_id = "inner".city_id)
-> Nested Loop (cost=3.33..10.32 rows=1 width=272) (actual
time=2.136..2.863 rows=3 loops=1)
Join Filter: ("outer".division_id =
"inner".division_id)plication.app_id and reco=
-> Nested Loop (cost=3.33..9.27 rows=1 width=231) (actual
time=2.119..2.763 rows=3 loops=1)
Join Filter: ("outer".ldc_id = "inner".ldc_id)
-> Nested Loop (cost=3.33..8.23 rows=1 width=218) (actual
time=2.101..2.659 rows=3 loops=1)
-> Nested Loop (cost=3.33..5.15 rows=1 width=151)
(actual time=2.068..2.559 rows=3 loops=1)
Join Filter: ("inner".app_id = "outer".app_id)
-> Merge Join (cost=3.33..4.11 rows=1
width=159) (actual time=1.096..1.477 rows=31 loops=1)
Merge Cond: ("outer".marketer_product_id =
"inner".marketer_product_id)
-> Index Scan using
"PK_marketer_product_id" on marketer_product (cost=0.00..3.04 rows=4
width=12) (actual time=0.017..0.033 rows=4 loops=1)
-> Sort (cost=3.33..3.33 rows=1
width=155) (actual time=1.065..1.180 rows=31 loops=1)
Sort Key: account.marketer_product_id
-> Hash Join (cost=1.75..3.32
rows=1 width=155) (actual time=0.457..0.848 rows=31 loops=1)
Hash Cond: (("outer".app_id =
"inner".app_id) AND ("outer".acct_id = "inner".acct_id))
-> Seq Scan on account
(cost=0.00..1.28 rows=28 width=155) (actual time=0.007..0.160 rows=34
loops=1)
-> Hash (cost=1.50..1.50
rows=50 width=8) (actual time=0.413..0.413 rows=50 loops=1)
-> Seq Scan on
application (cost=0.00..1.50 rows=50 width=8) (actual time=0.006..0.209
rows=50 loops=1)
-> Seq Scan on reconciliation (cost=0.00..1.03
rows=1 width=4) (actual time=0.005..0.016 rows=3 loops=31)
Filter: (transferred_date IS NULL)
-> Index Scan using customer_pkey on customer
(cost=0.00..3.06 rows=1 width=75) (actual time=0.011..0.015 rows=1 loops=3)
Index Cond: ("outer".customer_id =
customer.customer_id)
-> Seq Scan on product (cost=0.00..1.02 rows=2 width=21)
(actual time=0.005..0.013 rows=2 loops=3)
-> Seq Scan on marketer_divisions (cost=0.00..1.02 rows=2
width=49) (actual time=0.005..0.013 rows=2 loops=3)
-> Seq Scan on cities (cost=0.00..1.02 rows=2 width=4) (actual
time=0.005..0.013 rows=2 loops=3)
Total runtime: 3.288 ms

Nested Loops off:

Hash Join (cost=8.27..11.78 rows=1 width=268) (actual time=1.701..1.765
rows=3 loops=1)
Hash Cond: ("outer".city_id = "inner".city_id)
-> Hash Join (cost=7.24..10.73 rows=1 width=272) (actual
time=1.629..1.667 rows=3 loops=1)
Hash Cond: ("outer".customer_id = "inner".customer_id)
-> Seq Scan on customer (cost=0.00..3.32 rows=32 width=75)
(actual time=0.006..0.136 rows=33 loops=1)
-> Hash (cost=7.24..7.24 rows=1 width=205) (actual
time=1.366..1.366 rows=3 loops=1)
-> Hash Join (cost=6.43..7.24 rows=1 width=205) (actual
time=1.243..1.333 rows=3 loops=1)
Hash Cond: ("outer".division_id = "inner".division_id)
-> Hash Join (cost=5.40..6.20 rows=1 width=164)
(actual time=1.184..1.252 rows=3 loops=1)
Hash Cond: ("outer".ldc_id = "inner".ldc_id)
-> Merge Join (cost=4.38..5.16 rows=1
width=151) (actual time=1.124..1.169 rows=3 loops=1)
Merge Cond: ("outer".marketer_product_id =
"inner".marketer_product_id)
-> Index Scan using
"PK_marketer_product_id" on marketer_product (cost=0.00..3.04 rows=4
width=12) (actual time=0.012..0.019 rows=2 loops=1)
-> Sort (cost=4.38..4.38 rows=1
width=147) (actual time=1.098..1.109 rows=3 loops=1)
Sort Key: account.marketer_product_id
-> Hash Join (cost=2.78..4.37
rows=1 width=147) (actual time=1.007..1.064 rows=3 loops=1)
Hash Cond: ("outer".app_id =
"inner".app_id)
-> Hash Join (cost=1.75..3.32
rows=1 width=155) (actual time=0.494..0.875 rows=31 loops=1)
Hash Cond:
(("outer".app_id = "inner".app_id) AND ("outer".acct_id = "inner".acct_id))
-> Seq Scan on account
(cost=0.00..1.28 rows=28 width=155) (actual time=0.007..0.154 rows=34
loops=1)
-> Hash
(cost=1.50..1.50 rows=50 width=8) (actual time=0.451..0.451 rows=50 loops=1)
-> Seq Scan on
application (cost=0.00..1.50 rows=50 width=8) (actual time=0.006..0.223
rows=50 loops=1)
-> Hash (cost=1.03..1.03
rows=1 width=4) (actual time=0.042..0.042 rows=3 loops=1)
-> Seq Scan on
reconciliation (cost=0.00..1.03 rows=1 width=4) (actual time=0.007..0.019
rows=3 loops=1)
Filter:
(transferred_date IS NULL)
-> Hash (cost=1.02..1.02 rows=2 width=21)
(actual time=0.036..0.036 rows=2 loops=1)
-> Seq Scan on product (cost=0.00..1.02
rows=2 width=21) (actual time=0.005..0.014 rows=2 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=49) (actual
time=0.036..0.036 rows=2 loops=1)
-> Seq Scan on marketer_divisions
(cost=0.00..1.02 rows=2 width=49) (actual time=0.007..0.016 rows=2 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=4) (actual time=0.039..0.039
rows=2 loops=1)
-> Seq Scan on cities (cost=0.00..1.02 rows=2 width=4) (actual
time=0.009..0.017 rows=2 loops=1)
Total runtime: 2.084 ms

With nested loops enabled does it choose to use them because it sees the
estimated start up cost with loops as less? Does it not know that the total
query would be faster with the Hash Joins? This query is in development
right now, and as such there are not many rows. When it goes to production
the reconciliation table will grow by about 50 100 rows per day where the
transferred_date is NULL (this is the driving criteria behind this query.)
As the table grows can I expect Pg to realize the the nested loops will be
slower and will it switch to the Hash Joins? If not how would I force it to
use the Hash Joins without just turning off nested loops completely? Is it
a good idea to turn off nested loops completely?

Statistics collecting and auto vacuum is enabled btw. I have an erd diagram
showing the table structures if anyone is interested in looking at it, just
let me know.

Thanks,
Ketema

Responses

Re: Nested Loops vs. Hash Joins or Merge Joins at 2006-05-11 22:51:36 from Jim C. Nasby

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Christian Paul Cosinas	2006-05-11 14:45:33	Speed Up Offset and Limit Clause
Previous Message	Martijn van Oosterhout	2006-05-11 08:30:25	Re: [HACKERS] Big IN() clauses etc : feature proposal