Digesting explain analyze

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: pgsql-performance(at)postgresql(dot)org
Subject: Digesting explain analyze
Date: 2010-01-06 19:10:34
Message-ID: 4B44E02A.6080109@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi.

I have a table that consists of somewhere in the magnitude of 100.000.000
rows and all rows are of this tuples

(id1,id2,evalue);

Then I'd like to speed up a query like this:

explain analyze select id from table where id1 = 2067 or id2 = 2067 order
by evalue asc limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1423.28..1423.28 rows=100 width=12) (actual
time=2.565..2.567 rows=100 loops=1)
-> Sort (cost=1423.28..1424.54 rows=505 width=12) (actual
time=2.560..2.560 rows=100 loops=1)
Sort Key: evalue
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on table (cost=16.58..1420.75 rows=505
width=12) (actual time=0.709..1.752 rows=450 loops=1)
Recheck Cond: ((id1 = 2067) OR (id2 = 2067))
-> BitmapOr (cost=16.58..16.58 rows=506 width=0) (actual
time=0.676..0.676 rows=0 loops=1)
-> Bitmap Index Scan on id1_evalue_idx
(cost=0.00..11.44 rows=423 width=0) (actual
time=0.599..0.599 rows=450 loops=1)
Index Cond: (id1_id = 2067)
-> Bitmap Index Scan on id2_evalue_idx
(cost=0.00..4.89 rows=83 width=0) (actual
time=0.070..0.070 rows=1 loops=1)
Index Cond: (id2_id = 2067)
Total runtime: 2.642 ms
(12 rows)

What I had expected was to see the "Bitmap Index Scan on id1_evalue_idx"
to chop it off at a "limit 1". The inner sets are on average 3.000 for
both id1 and id2 and a typical limit would be 100, so if I could convince
postgresql to not fetch all of them then I would reduce the set retrieved
by around 60. The dataset is quite large so the random query is not very
likely to be hitting the same part of the dataset again, so there is going
to be a fair amount of going to disk.,

I would also mean that using it in a for loop in a stored-procedure in
plpgsql it would not get any benefit from the CURSOR effect?

I actually tried to stuff id1,id2 into an array and do a GIST index on the
array,evalue hoping that it directly would satisfy this query.. it used
the GIST index fetch the rows the post-sorting and limit on the set.

What it boils down to is more or less:

Does a "bitmap index scan" support ordering and limit ?
Does a "multicolummn gist index" support ordering and limit ?

Have I missed anything that can hugely speed up fetching these (typically
100 rows) from the database.

--
Jesper

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ron Mayer 2010-01-06 20:03:21 Re: Digesting explain analyze
Previous Message Craig James 2010-01-06 18:31:12 Re: pg_connect takes 3.0 seconds