Force a merge join?

From: Doug Fields <dfields-pg-general(at)pexicom(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Force a merge join?
Date: 2002-05-15 19:31:30
Message-ID: 5.1.0.14.2.20020515142918.01f0f718@mail.pexicom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

[7.2.1, Debian/woody kernel 2.2.20]

I have an extremely slow running query which uses a nested loop. I would
like to force it to use a merge join instead, but I have been unable to
figure out how to fool the optimizer to do that. Alternately, I would
welcome suggestions on how to speed up the query.

Yes, I have tried turning off enable_nestloop and playing with the tuple
costs somewhat.

When I say extremely slowly, I mean it. We're talking 280 seconds for a
query which returns 3000 rows out of a possible ~800,000 using two parts of
the same table (joined together) of sizes 500 and 15000 or so.

Please find below some details on my query, the explain, the table and the
indices. Some columns not referenced have been omitted.

Note that I just increased the RAM on the database server to 2 gigs and
allocated almost 1.4 gigs to shared buffers. All my tables and indices
total size (for now) is about 600 megs, so everything in theory could be
running from memory. I have a continuous vmstat running and show no
swapping, and no "blocks in" activity indicating it is indeed running fully
from memory.

The INTENT of the query is to take a set of e-mail addresses (list A) and
another set (list B) and find the intersection - that is, all entries in A
which have a corresponding entry in B based upon having the same e-mail
address. This is the last one part of a larger query (which is otherwise
very fast) which selects parts of list A and then removes (via combining
with EXCEPT) the remaining ones in the query below. These are then inserted
into yet another table so the whole query looks like INSERT ... SELECT ...
EXCEPT SELECT ... where the last SELECT is the slow query I show below.

By breaking the query into a bunch (about 40) of queries without any ORs
and UNIONing them, I can get about a 3-4 times improvement, but that still
means this is an 80 second query, and is not scalable to the case where it
can't be broken into multiple queries.

Query:

-- Note the odd 745 is because this is used in an INSERT ... SELECT statement
-- Also note that there is no difference if I use IN (1,2,3,4) syntax
SELECT 745, a.list_entry_id
FROM list_entries a, list_entries b
WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR a.list_id=147 OR
a.list_id=144)
AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR b.list_id=434 OR
b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR b.list_id=418)
AND LOWER(a.email) = LOWER(b.email);

psql's \d on the table name and indices: (use fixed-width font)

Table "list_entries"
Column | Type |
Modifiers
---------------+--------------------------+--------------------------------------------------------------------
list_id | integer | not null
list_entry_id | integer | not null default
nextval('"list_entries_list_entry_id_seq"'::text)
email | character varying(64) |
(about a dozen more mostly varchar fields not shown)
Indexes: list_id_idx,
lower_email_idx
Primary key: list_entries_pkey
Triggers: RI_ConstraintTrigger_1572852

Index "list_id_idx"
Column | Type
---------+---------
list_id | integer
btree

Index "lower_email_idx"
Column | Type
--------+------
lower | text
btree

Query analysis:

-> Nested Loop (cost=0.00..219086859.00 rows=1585343 width=52)
(actual time=0.27..294568.88 rows=2970 loops=1)
-> Index Scan using list_id_idx, list_id_idx, list_id_idx,
list_id_idx, list_id_idx on list_entries a (cost=0.00..29025.99 rows=14115
width=28) (actual time=0.05..176.79 rows=15859 loops=1)
-> Index Scan using lower_email_idx on list_entries
b (cost=0.00..15513.40 rows=112 width=24) (actual time=0.69..18.55 rows=0
loops=15859)

Database optimizer settings:

NOTICE: cpu_operator_cost is 0.0025
NOTICE: cpu_tuple_cost is 0.05
NOTICE: cpu_index_tuple_cost is 0.001

All enable_X options are ON

Many thanks,

Doug

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Evgeniy Strokin 2002-05-15 19:58:50 Re: XML from postgreSQL tables
Previous Message Jeff Davis 2002-05-15 18:46:49 Re: XML from postgreSQL tables