Hash or merge join instead of inner loop

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Hash or merge join instead of inner loop
Date: 2003-06-09 20:40:09
Message-ID: 20030609204009.GJ40542@flake.decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a query that's cauing pgsql choose either a hash or merge join
depending on how I mess with the stats variables, but it won't choose an
nested loop, even though it's the fastest.

The estimate for the nested loop index scans always seems to be way high
on the high end. Note that it's 0-3 in one case and 0-2 in the other,
but the actual time is very low in both cases. Why is this? I haven't
been able to make much of a difference by changing the optimizer
variables.

This is on a solaris machine, if that matters. Tinput_data, locality,
and postal code have 1300, 28000 and 43000 rows, respectively, and
locality and postal code are very narrow tables (full definition below).

usps=# explain analyze SELECT key, pc.locality_id, l.state_code::varchar FROM Tinput_data i, postal_code pc, locality l WHERE i.zip = pc.postal_code AND l.locality_id = pc.locality_id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=940.20..1417.94 rows=1380 width=36) (actual time=1727.30..2363.91 rows=1380 loops=1)
Merge Cond: ("outer".locality_id = "inner".locality_id)
-> Index Scan using locality_pkey on locality l (cost=0.00..455.99 rows=27789 width=10) (actual time=0.62..495.39 rows=27632 loops=1)
-> Sort (cost=940.20..940.55 rows=1380 width=26) (actual time=1725.53..1726.71 rows=1380 loops=1)
Sort Key: pc.locality_id
-> Merge Join (cost=42.00..933.00 rows=1380 width=26) (actual time=56.27..1684.67 rows=1380 loops=1)
Merge Cond: ("outer".postal_code = "inner".zip)
-> Index Scan using postal_code_postal_code_key on postal_code pc (cost=0.00..869.31 rows=42704 width=13) (actual time=10.05..1396.11 rows=42418 loops=1)
-> Sort (cost=42.00..42.34 rows=1380 width=13) (actual time=39.63..40.97 rows=1380 loops=1)
Sort Key: i.zip
-> Seq Scan on tinput_data i (cost=0.00..34.80 rows=1380 width=13) (actual time=0.02..12.13 rows=1380 loops=1)
Total runtime: 2367.50 msec
(12 rows)

usps=# set enable_mergejoin=0;
SET
usps=# set enable_hashjoin=0;
SET
usps=# explain analyze SELECT key, pc.locality_id, l.state_code::varchar FROM Tinput_data i, postal_code pc, locality l WHERE i.zip = pc.postal_code AND l.locality_id = pc.locality_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..6991.66 rows=1380 width=36) (actual time=0.22..231.00 rows=1380 loops=1)
-> Nested Loop (cost=0.00..4203.23 rows=1380 width=26) (actual time=0.14..132.70 rows=1380 loops=1)
-> Seq Scan on tinput_data i (cost=0.00..34.80 rows=1380 width=13) (actual time=0.02..17.41 rows=1380 loops=1)
-> Index Scan using postal_code_postal_code_key on postal_code pc (cost=0.00..3.01 rows=1 width=13) (actual time=0.06..0.06 rows=1 loops=1380)
Index Cond: ("outer".zip = pc.postal_code)
-> Index Scan using locality_pkey on locality l (cost=0.00..2.01 rows=1 width=10) (actual time=0.05..0.05 rows=1 loops=1380)
Index Cond: (l.locality_id = "outer".locality_id)
Total runtime: 233.60 msec
(8 rows)

Table "pg_temp_1.tinput_data"
Column | Type | Modifiers
------------------+-----------------------+-----------
key | integer | not null
firm | character varying(40) |
address | integer |
address_v | character varying(10) |
odd_even | character(1) |
street_name | character varying(40) |
street_metaphone | character varying(4) |
apartment | integer |
apartment_v | character varying(10) |
apartment_label | character varying(5) |
city | character varying(40) |
city_metaphone | character varying(4) |
state | character varying(40) |
zip | character varying(5) |
Indexes: tinput_data_pkey primary key btree ("key")

usps=# \d postal_code
Table "public.postal_code"
Column | Type | Modifiers
----------------+-----------------------+-------------------------------------------------------------------------
postal_code_id | integer | not null default nextval('public.postal_code_postal_code_id_seq'::text)
postal_code | character varying(10) | not null
locality_id | integer | not null
Indexes: postal_code_pkey primary key btree (postal_code_id),
postal_code_postal_code_key unique btree (postal_code)

usps=# \d locality
Table "public.locality"
Column | Type | Modifiers
-------------+-----------------------+-------------------------------------------------------------------
locality_id | integer | not null default nextval('public.locality_locality_id_seq'::text)
locality | character varying(10) | not null
state_code | character(2) | not null
Indexes: locality_pkey primary key btree (locality_id)
Foreign Key constraints: $1 FOREIGN KEY (state_code) REFERENCES state(state_code) ON UPDATE NO ACTION ON DELETE NO ACTION

--
Jim C. Nasby (aka Decibel!) jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Shridhar Daithankar 2003-06-10 05:15:25 Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning
Previous Message Howard Oblowitz 2003-06-09 09:18:47 FW: [ADMIN] Shared_buffers and kernel parameters, tuning