How to get the optimizer to use an index with multiple fields

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: How to get the optimizer to use an index with multiple fields
Date: 2003-04-28 18:21:43
Message-ID: 20030428132143.J66185@flake.decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

select id into temp NewRetires from stats_participant where retire_to>=1
AND retire_date = (SELECT last_date FROM Project_statsrun WHERE
project_id = :ProjectID);

results in a table with 5 values...

explain analyze delete from email_rank where project_id=25 and id in
(select id from NewRetires);

Index Scan using email_rank__day_rank on email_rank
(cost=0.00..9003741627715.16 rows=45019 width=6) (actual time=408.12..9688.37 rows=3 loops=1)
Index Cond: (project_id = 25)
Filter: (subplan)
SubPlan
-> Seq Scan on newretires (cost=100000000.00..100000020.00 rows=1000 width=4) (actual time=0.01..0.05 rows=5 loops=91834)
Total runtime: 9689.86 msec

But, there's already an index that would fit the bill here perfectly:

Table "public.email_rank"
Column | Type | Modifiers
-----------------------+---------+--------------------
project_id | integer | not null
id | integer | not null
first_date | date | not null
last_date | date | not null
day_rank | integer | not null default 0
day_rank_previous | integer | not null default 0
overall_rank | integer | not null default 0
overall_rank_previous | integer | not null default 0
work_today | bigint | not null default 0
work_total | bigint | not null default 0
Indexes: email_rank_pkey primary key btree (project_id, id),
email_rank__day_rank btree (project_id, day_rank),
email_rank__overall_rank btree (project_id, overall_rank)

Why isn't it using email_rank_pkey instead of using day_rank then a
filter? The original query on sybase (see below) is essentially instant,
because it's using the index of (project_id, id), so it doesn't have to
read the whole table.

stats=> select project_id,count(*) from email_rank group by project_id;
project_id | count
------------+--------
5 | 327856
8 | 28304
24 | 34622
25 | 91834
205 | 331464

Also, changing the WHERE IN to a WHERE EXISTS in the delete is
substantially faster in this case (3.5 seconds as opposed to 9); it
would be nice if the optimizer could rewrite the query on-the-fly. I
started looking into this in the first place because the original query
was taking 6-10 seconds, which seemed too long...

Original query:
DELETE FROM Email_Rank
WHERE project_id = :ProjectID
AND id IN (SELECT id
FROM STATS_Participant sp
WHERE retire_to >= 1
AND retire_date = (SELECT last_date FROM Project_statsrun WHERE project_id = :ProjectID)
)
;

I tried changing this to an EXISTS and it takes over a minute. So in
this case, the range of runtimes is ~4 seconds (building the temp table
takes ~0.25 seconds) to over a minute.
--
Jim C. Nasby (aka Decibel!) jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jim C. Nasby 2003-04-28 18:32:44 Re: How to get the optimizer to use an index with multiple fields
Previous Message Andrew Sullivan 2003-04-28 11:12:06 Re: Diferent execution plan for similar query