Quick Links

How to get the optimizer to use an index with multiple fields

From:	"Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	How to get the optimizer to use an index with multiple fields
Date:	2003-04-28 18:21:43
Message-ID:	20030428132143.J66185@flake.decibel.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

select id into temp NewRetires from stats_participant where retire_to>=1
AND retire_date = (SELECT last_date FROM Project_statsrun WHERE
project_id = :ProjectID);

results in a table with 5 values...

explain analyze delete from email_rank where project_id=25 and id in
(select id from NewRetires);

Index Scan using email_rank__day_rank on email_rank
(cost=0.00..9003741627715.16 rows=45019 width=6) (actual time=408.12..9688.37 rows=3 loops=1)
Index Cond: (project_id = 25)
Filter: (subplan)
SubPlan
-> Seq Scan on newretires (cost=100000000.00..100000020.00 rows=1000 width=4) (actual time=0.01..0.05 rows=5 loops=91834)
Total runtime: 9689.86 msec

But, there's already an index that would fit the bill here perfectly:

Why isn't it using email_rank_pkey instead of using day_rank then a
filter? The original query on sybase (see below) is essentially instant,
because it's using the index of (project_id, id), so it doesn't have to
read the whole table.

stats=> select project_id,count(*) from email_rank group by project_id;
project_id | count
------------+--------
5 | 327856
8 | 28304
24 | 34622
25 | 91834
205 | 331464

Also, changing the WHERE IN to a WHERE EXISTS in the delete is
substantially faster in this case (3.5 seconds as opposed to 9); it
would be nice if the optimizer could rewrite the query on-the-fly. I
started looking into this in the first place because the original query
was taking 6-10 seconds, which seemed too long...

Original query:
DELETE FROM Email_Rank
WHERE project_id = :ProjectID
AND id IN (SELECT id
FROM STATS_Participant sp
WHERE retire_to >= 1
AND retire_date = (SELECT last_date FROM Project_statsrun WHERE project_id = :ProjectID)
)
;

I tried changing this to an EXISTS and it takes over a minute. So in
this case, the range of runtimes is ~4 seconds (building the temp table
takes ~0.25 seconds) to over a minute.
--
Jim C. Nasby (aka Decibel!) jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Responses

Re: How to get the optimizer to use an index with multiple fields at 2003-04-28 18:32:44 from Jim C. Nasby
Re: How to get the optimizer to use an index with multiple at 2003-04-28 19:52:57 from scott.marlowe

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jim C. Nasby	2003-04-28 18:32:44	Re: How to get the optimizer to use an index with multiple fields
Previous Message	Andrew Sullivan	2003-04-28 11:12:06	Re: Diferent execution plan for similar query