postgresql uses Hash-join, i need Nested-loop

From: Gábor Farkas <gabor(at)nekomancer(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: postgresql uses Hash-join, i need Nested-loop
Date: 2009-08-24 06:54:46
Message-ID: 2c173a350908232354r3f4f75f1p48e604f784b4ff93@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

hi,

i have a query that uses a Hash-Join, but it would be faster with Nested-Loop,
and i don't know how to persuade postgresql to do it.

details:

postgresql-8.2 + tsearch2

i have 2 tables, one for people, and one that does a many-to-many
link between people:

CREATE TABLE personlink (
id integer NOT NULL,
relid integer NOT NULL,
created timestamp with time zone DEFAULT now() NOT NULL,
changed timestamp with time zone,
editorid integer NOT NULL
);

btree indexes on "id" and "relid",
PRIMARY KEY btree index on (id,relid).

CREATE TABLE person (
id integer NOT NULL,
firstname character varying(255),
.
.
.
);
PRIMARY KEY btree index on "id".
gin index on "firstname" (for tsearch2)

(the "person" table contains more columns (around 30))

personlink contains 1.500.000 rows, person contains 900.000 rows.
i did a vacuum-with-analyze.

my query is:

SELECT personlink.id
FROM personlink
INNER JOIN person ON personlink.relid=person.id
WHERE to_tsquery('default','duck') @@ to_tsvector('default',person.firstname);

explain analyze says this:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3108.62..35687.67 rows=1535 width=4) (actual
time=901.110..6113.683 rows=2 loops=1)
Hash Cond: (personlink.relid = person.id)
-> Seq Scan on personlink (cost=0.00..26805.14 rows=1535614
width=8) (actual time=0.029..3000.503 rows=1535614 loops=1)
-> Hash (cost=3097.80..3097.80 rows=866 width=4) (actual
time=0.185..0.185 rows=8 loops=1)
-> Bitmap Heap Scan on person (cost=23.09..3097.80 rows=866
width=4) (actual time=0.078..0.160 rows=8 loops=1)
Recheck Cond: ('''duck'''::tsquery @@
to_tsvector('default'::text, (firstname)::text))
-> Bitmap Index Scan on person_firstname_exact
(cost=0.00..22.87 rows=866 width=0) (actual time=0.056..0.056 rows=8
loops=1)
Index Cond: ('''duck'''::tsquery @@
to_tsvector('default'::text, (firstname)::text))
Total runtime: 6113.748 ms
(9 rows)

if i disable hash-joins with "SET enable_hashjoin =false;"

i get:

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..45698.23 rows=1535 width=4) (actual
time=4.960..15.098 rows=2 loops=1)
-> Index Scan using person_firstname_exact on person
(cost=0.00..3463.53 rows=866 width=4) (actual time=0.117..0.234 rows=8
loops=1)
Index Cond: ('''duck'''::tsquery @@
to_tsvector('default'::text, (firstname)::text))
-> Index Scan using personlink_relid_idx on personlink
(cost=0.00..48.54 rows=18 width=8) (actual time=1.848..1.849 rows=0
loops=8)
Index Cond: (personlink.relid = person.id)
Total runtime: 15.253 ms
(6 rows)

what could i do to persuade postgresql to choose the faster Nested-Loop?

thanks,
gabor

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Fred Janon 2009-08-24 09:24:59 How to create a multi-column index with 2 dates using 'gist'?
Previous Message Jeff Janes 2009-08-23 22:26:16 Re: [PERFORMANCE] how to set wal_buffers