Optimising "in" queries

From: Stephen Davies <scldad(at)sdc(dot)com(dot)au>
To: pgsql-performance(at)postgresql(dot)org
Subject: Optimising "in" queries
Date: 2007-08-22 02:40:36
Message-ID: 200708221210.36770.scldad@sdc.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a PostgreSQL 8.2.4 table with some seven million rows.

The psql query:

select count(rdate),rdate from reading where sensor_id in
(1137,1138,1139) group by rdate order by rdate desc limit 1;

takes a few seconds but:

select count(rdate),rdate from reading where sensor_id in
(1137,1138,1139,1140) group by rdate order by rdate desc limit 1;

(anything with four or more values in the "in" list) takes several
minutes.

Is there any way to make the "larger" queries more efficient?

Both rdate and sensor_id are indexed and the database is vacuumed every
night.

The values in the "in" list are seldom as "neat" as in the above
examples. Actual values can range from 1 to about 2000. The number of
values ranges from 2 to about 10.

Explain outputs are:

benparts=# explain select count(rdate),rdate from reading where
sensor_id in (1137,1138,1139,1140) group by rdate order by rdate desc
limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Limit (cost=0.00..39890.96 rows=1 width=8)
-> GroupAggregate (cost=0.00..7938300.21 rows=199 width=8)
-> Index Scan Backward using date on reading
(cost=0.00..7937884.59 rows=82625 width=8)
Filter: (sensor_id = ANY
('{1137,1138,1139,1140}'::integer[]))
(4 rows)

benparts=# explain select count(rdate),rdate from reading where
sensor_id in (1137,1138,1139) group by rdate order by rdate desc limit
1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Limit (cost=48364.32..48364.32 rows=1 width=8)
-> Sort (cost=48364.32..48364.49 rows=69 width=8)
Sort Key: rdate
-> HashAggregate (cost=48361.35..48362.21 rows=69 width=8)
-> Bitmap Heap Scan on reading (cost=535.53..48218.10
rows=28650 width=8)
Recheck Cond: (sensor_id = ANY
('{1137,1138,1139}'::integer[]))
-> Bitmap Index Scan on reading_sensor
(cost=0.00..528.37 rows=28650 width=0)
Index Cond: (sensor_id = ANY
('{1137,1138,1139}'::integer[]))
(8 rows)

TIA,
Stephen Davies
--
========================================================================
This email is for the person(s) identified above, and is confidential to
the sender and the person(s). No one else is authorised to use or
disseminate this email or its contents.

Stephen Davies Consulting Voice: 08-8177 1595
Adelaide, South Australia. Fax: 08-8177 0133
Computing & Network solutions. Mobile:0403 0405 83

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2007-08-22 04:20:25 Re: Optimising "in" queries
Previous Message Scott Marlowe 2007-08-21 23:16:00 Re: Autovacuum is running forever