Re: query using incorrect index

From: Russell Keane <Russell(dot)Keane(at)inps(dot)co(dot)uk>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query using incorrect index
Date: 2012-08-03 09:50:02
Message-ID: 8D0E5D045E36124A8F1DDDB463D548557CEF4D2FAB@mxsvr1.is.inps.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

You're right, a lot of information is missing but I'm unsure that the other information will make too much difference.
I could drop all the other indexes on the table which aren't used here and the queries would still use the indexes they are currently using.

I appreciate the idea that a boolean column selectivity might not be great. I've just tried creating indexes as follows:
CREATE INDEX messageq17
ON messageq_table
USING btree
(staff_ty, staff_id, incoming, inactive, entity_id);

CREATE INDEX messageq18
ON messageq_table
USING btree
(staff_ty, staff_id);

When running query 2 as it stands the same thing happens, it still uses the messageq1 index.

The query is logically the same as using max, you are correct, but it's generated on the fly so the limit or the queried column may change.

The query plans were for the second query as I'm unsure that the first query is really relevant, it was simply there to justify the messageq1 index.

Thanks,

From: Robert Klemme [mailto:shortcutter(at)googlemail(dot)com]
Sent: 03 August 2012 10:18
To: Russell Keane; pgsql-performance
Subject: Re: [PERFORM] query using incorrect index

On Thu, Aug 2, 2012 at 4:54 PM, Russell Keane <Russell(dot)Keane(at)inps(dot)co(dot)uk<mailto:Russell(dot)Keane(at)inps(dot)co(dot)uk>> wrote:

Using PG 9.0 and given 2 queries (messageq_current is a view on the messageq_table):

select entity_id from messageq_current
where entity_id = 123456;

select entity_id from messageq_current
where incoming = true
and inactive = false
and staff_ty = 2
and staff_id = 2
order by entity_id desc
limit 1;

and 2 indexes (there are 15 indexes in total but they are left out here for brevity):

messageq1:
CREATE INDEX messageq1
ON messageq_table
USING btree
(entity_id);

And messageq4:

CREATE INDEX messageq4
ON messageq_table
USING btree
(inactive, staff_ty, staff_id, incoming, tran_dt);

Of course a lot of detail is missing (full schema of table, all the other indexes) but with "inactive" a boolean column I suspect selectivity might not be too good here and so having it as a first column in a covering index is at least questionable. If query 2 is frequent you might also want to consider creating a partial index only on (staff_ty, staff_id) with filtering criteria on incoming and active as present in query 2.

Btw, why don't you formulate query 2 as max query?
select max(entity_id) as entity_id
from messageq_current
where incoming = true
and inactive = false
and staff_ty = 2
and staff_id = 2;

With the messageq1 index present, query 1 is very quick (0.094ms) and query 2 is very slow (241.515ms).
If I remove messageq1 then query 2 uses messageq4 and is very quick (0.098ms) but then query 1 must use a different index and is therefore slower (> 5ms).

So, to the Query plans:

Of which query? Shouldn't there be four plans in total? I'd post plans here:
http://explain.depesz.com/

With messageq1:
"Limit (cost=0.00..2670.50 rows=1 width=4) (actual time=241.481..241.481 rows=0 loops=1)"
" Output: messageq_table.entity_id"
" Buffers: shared hit=32 read=18870 written=12"
" -> Index Scan Backward using messageq1 on prac_live_10112.messageq_table (cost=0.00..66762.53 rows=25 width=4) (actual time=241.479..241.479 rows=0 loops=1)"
" Output: messageq_table.entity_id"
" Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND (messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND (aud_status_to_flag(messageq_table.aud_status) = 1))"
" Buffers: shared hit=32 read=18870 written=12"
"Total runtime: 241.515 ms"

Without messageq1:
"Limit (cost=12534.45..12534.45 rows=1 width=4) (actual time=0.055..0.055 rows=0 loops=1)"
" Output: messageq_table.entity_id"
" Buffers: shared read=3"
" -> Sort (cost=12534.45..12534.51 rows=25 width=4) (actual time=0.054..0.054 rows=0 loops=1)"
" Output: messageq_table.entity_id"
" Sort Key: messageq_table.entity_id"
" Sort Method: quicksort Memory: 17kB"
" -> Bitmap Heap Scan on prac_live_10112.messageq_table (cost=174.09..12534.32 rows=25 width=4) (actual time=0.043..0.043 rows=0 loops=1)"
" Output: messageq_table.entity_id"
" Recheck Cond: ((messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2))"
" Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND (aud_status_to_flag(messageq_table.aud_status) = 1))"
" Buffers: shared read=3"
" -> Bitmap Index Scan on messageq4 (cost=0.00..174.08 rows=4920 width=0) (actual time=0.040..0.040 rows=0 loops=1)"
" Index Cond: ((messageq_table.inactive = false) AND (messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND (messageq_table.incoming = true))"
" Buffers: shared read=3"
"Total runtime: 0.098 ms"

Clearly the statistics are off somehow but I really don't know where to start.

Any help you can give me would be very much appreciated.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Russell Keane 2012-08-03 10:00:29 Re: query using incorrect index
Previous Message JC de Villa 2012-08-03 09:48:23 Re: Messed up time zones