WHERE vs ORDER BY vs LIMIT why not using the correct index?

From: "D(dot) Dante Lorenso" <dante(at)lorenso(dot)com>
To: Postgres-General <pgsql-general(at)postgresql(dot)org>
Subject: WHERE vs ORDER BY vs LIMIT why not using the correct index?
Date: 2008-01-08 22:41:09
Message-ID: 4783FC05.5000907@lorenso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

All,

I have a simple query:

SELECT tcq_id
FROM queue q
WHERE q.status = 'D'
ORDER BY tcq_id ASC
LIMIT 1;

What I'm trying to do is find all the items in the queue which have a
status of 'D' and then select the oldest item first. My data is very
skewed such that there are not many records matching the WHERE clause.

status my_count
D 4
Q 6
S 20
P 74
F 3294
C 291206

However, when I explain that query, it's doing an index scan on the
'tcq_id' primary key column instead of using the index on 'status':

------------------------------
EXPLAIN SELECT tcq_id
FROM queue q
WHERE q.status = 'D'
ORDER BY tcq_id ASC
LIMIT 1;

Limit (cost=0.00..40.09 rows=1 width=8)
-> Index Scan using queue_pkey on queue q (cost=0.00..59059.80
rows=1473 width=8)
Filter: (status = 'D'::bpchar)
------------------------------

But then when I take out the ORDER BY clause, PostgreSQL will use a
different (and proper) index:

------------------------------
EXPLAIN SELECT tcq_id
FROM transcode_queue q
WHERE q.status = 'D'
LIMIT 1;

Limit (cost=0.00..3.81 rows=1 width=8)
-> Index Scan using queue_idx_status on queue q (cost=0.00..5618.07
rows=1473 width=8)
Index Cond: (status = 'D'::bpchar)
------------------------------

I don't understand why the ORDER BY condition would be affecting my
WHERE criteria. Shouldn't the ordering be done after the filter is
first applied?

I'm wanting: "find the 4 rows where status = 'D' then order those by
tcq_id and return the first one." But postgresql seems to be choosing:
"order all records by the tcq_id then scan them sequentially and find
the first one matching status = 'D'".

How can I influence the planner's decision while keeping my ORDER BY clause?

After furthing testing, maybe it's not the ORDER BY but the LIMIT that
is causing the poor planner choice? I tried to do this:

------------------------------
SELECT tcq_id
FROM (
SELECT tcq_id
FROM queue q
WHERE q.status = 'D'
) x
ORDER BY x.tcq_id ASC
LIMIT 1;

Limit (cost=0.00..40.09 rows=1 width=8)
-> Index Scan using queue_pkey on queue q (cost=0.00..59059.80
rows=1473 width=8)
Filter: (status = 'D'::bpchar)
------------------------------

But this results in another wrong index choice. So, I removed the LIMIT
clause and now it does use the right index:

------------------------------
SELECT tcq_id
FROM (
SELECT tcq_id
FROM queue q
WHERE q.status = 'D'
) x
ORDER BY x.tcq_id ASC;

Sort (cost=4314.36..4318.05 rows=1473 width=8)
Sort Key: q.tcq_id
-> Bitmap Heap Scan on queue q (cost=35.71..4236.85 rows=1473 width=8)
Recheck Cond: (status = 'D'::bpchar)
-> Bitmap Index Scan on queue_idx_status (cost=0.00..35.34
rows=1473 width=0)
Index Cond: (status = 'D'::bpchar)
------------------------------

Can someone shed some insight here and help me understand what's going on?

-- Dante

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-01-08 23:21:27 Re: Index trouble with 8.3b4
Previous Message x asasaxax 2008-01-08 20:21:41 XML path function