SELECT ignoring index even though ORDER BY and LIMIT present

From: Jori Jovanovich <jori(at)dimensiology(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: SELECT ignoring index even though ORDER BY and LIMIT present
Date: 2010-06-02 20:28:54
Message-ID: AANLkTik6RIYAtAISj1UxmzBC79l-juOuBGYSDzSkV7xG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

hi,

I have a problem space where the main goal is to search backward in time for
events. Time can go back very far into the past, and so the
table can get quite large. However, the vast majority of queries are all
satisfied by relatively recent data. I have an index on the row creation
date and I would like almost all of my queries to have a query plan looking
something like:

Limit ...
-> Index Scan Backward using server_timestamp_idx on events
(cost=0.00..623055.91 rows=8695 width=177)
...

However, PostgreSQL frequently tries to do a full table scan. Often what
controls whether a scan is performed or not is dependent on the size of the
LIMIT and how detailed the WHERE clause is. In practice, the scan is always
the wrong answer for my use cases (where "always" is defined to be >99.9%).

Some examples:

(1) A sample query that devolves to a full table scan

EXPLAIN
SELECT events.id, events.client_duration, events.message,
events.created_by, events.source, events.type, events.event,
events.environment,
events.server_timestamp, events.session_id, events.reference,
events.client_uuid
FROM events
WHERE client_uuid ~* E'^foo bar so what'
ORDER BY server_timestamp DESC
LIMIT 20;
QUERY PLAN (BAD!)
--------------------------------------------------------------------------
Limit (cost=363278.56..363278.61 rows=20 width=177)
-> Sort (cost=363278.56..363278.62 rows=24 width=177)
Sort Key: server_timestamp
-> Seq Scan on events (cost=0.00..363278.01 rows=24 width=177)
Filter: (client_uuid ~* '^foo bar so what'::text)

(2) Making the query faster by making the string match LESS specific (odd,
seems like it should be MORE)

EXPLAIN
SELECT events.id, events.client_duration, events.message,
events.created_by, events.source, events.type, events.event,
events.environment,
events.server_timestamp, events.session_id, events.reference,
events.client_uuid
FROM events
WHERE client_uuid ~* E'^foo'
ORDER BY server_timestamp DESC
LIMIT 20;
QUERY PLAN (GOOD!)

------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1433.14 rows=20 width=177)
-> Index Scan Backward using server_timestamp_idx on events
(cost=0.00..623055.91 rows=8695 width=177)
Filter: (client_uuid ~* '^foo'::text)

(3) Alternatively making the query faster by using a smaller limit

EXPLAIN
SELECT events.id, events.client_duration, events.message,
events.created_by, events.source, events.type, events.event,
events.environment,
events.server_timestamp, events.session_id, events.reference,
events.client_uuid
FROM events
WHERE client_uuid ~* E'^foo bar so what'
ORDER BY server_timestamp DESC
LIMIT 10;
QUERY PLAN (GOOD!)

----------------------------------------------------------------------------------------------------------
Limit (cost=0.00..259606.63 rows=10 width=177)
-> Index Scan Backward using server_timestamp_idx on events
(cost=0.00..623055.91 rows=24 width=177)
Filter: (client_uuid ~* '^foo bar so what'::text)

I find myself wishing I could just put a SQL HINT on the query to force the
index to be used but I understand that HINTs are considered harmful and are
therefore not provided for PostgreSQL, so what is the recommended way to
solve this?

thank you very much

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2010-06-02 20:52:49 Re: requested shared memory size overflows size_t
Previous Message Scott Marlowe 2010-06-02 20:12:15 Re: Autovacuum in postgres.