From: | "Jim C(dot) Nasby" <jim(at)nasby(dot)net> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | How to get the optimizer to use an index with multiple fields |
Date: | 2003-04-28 18:21:43 |
Message-ID: | 20030428132143.J66185@flake.decibel.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
select id into temp NewRetires from stats_participant where retire_to>=1
AND retire_date = (SELECT last_date FROM Project_statsrun WHERE
project_id = :ProjectID);
results in a table with 5 values...
explain analyze delete from email_rank where project_id=25 and id in
(select id from NewRetires);
Index Scan using email_rank__day_rank on email_rank
(cost=0.00..9003741627715.16 rows=45019 width=6) (actual time=408.12..9688.37 rows=3 loops=1)
Index Cond: (project_id = 25)
Filter: (subplan)
SubPlan
-> Seq Scan on newretires (cost=100000000.00..100000020.00 rows=1000 width=4) (actual time=0.01..0.05 rows=5 loops=91834)
Total runtime: 9689.86 msec
But, there's already an index that would fit the bill here perfectly:
Table "public.email_rank"
Column | Type | Modifiers
-----------------------+---------+--------------------
project_id | integer | not null
id | integer | not null
first_date | date | not null
last_date | date | not null
day_rank | integer | not null default 0
day_rank_previous | integer | not null default 0
overall_rank | integer | not null default 0
overall_rank_previous | integer | not null default 0
work_today | bigint | not null default 0
work_total | bigint | not null default 0
Indexes: email_rank_pkey primary key btree (project_id, id),
email_rank__day_rank btree (project_id, day_rank),
email_rank__overall_rank btree (project_id, overall_rank)
Why isn't it using email_rank_pkey instead of using day_rank then a
filter? The original query on sybase (see below) is essentially instant,
because it's using the index of (project_id, id), so it doesn't have to
read the whole table.
stats=> select project_id,count(*) from email_rank group by project_id;
project_id | count
------------+--------
5 | 327856
8 | 28304
24 | 34622
25 | 91834
205 | 331464
Also, changing the WHERE IN to a WHERE EXISTS in the delete is
substantially faster in this case (3.5 seconds as opposed to 9); it
would be nice if the optimizer could rewrite the query on-the-fly. I
started looking into this in the first place because the original query
was taking 6-10 seconds, which seemed too long...
Original query:
DELETE FROM Email_Rank
WHERE project_id = :ProjectID
AND id IN (SELECT id
FROM STATS_Participant sp
WHERE retire_to >= 1
AND retire_date = (SELECT last_date FROM Project_statsrun WHERE project_id = :ProjectID)
)
;
I tried changing this to an EXISTS and it takes over a minute. So in
this case, the range of runtimes is ~4 seconds (building the temp table
takes ~0.25 seconds) to over a minute.
--
Jim C. Nasby (aka Decibel!) jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"
From | Date | Subject | |
---|---|---|---|
Next Message | Jim C. Nasby | 2003-04-28 18:32:44 | Re: How to get the optimizer to use an index with multiple fields |
Previous Message | Andrew Sullivan | 2003-04-28 11:12:06 | Re: Diferent execution plan for similar query |