From: | russm <russm(at)icorp(dot)com(dot)au> |
---|---|
To: | <pgsql-sql(at)postgresql(dot)org> |
Subject: | alternate idioms for large "IN (...)" lists |
Date: | 2002-06-02 06:10:02 |
Message-ID: | B91FF3DA.40C6%russm@icorp.com.au |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
I'm working on an application (originally written for Oracle) that is being
brought to it's knees by one commonly-executed query - finding the most
common attributes for a given list of assets...
SELECT attribute_id, COUNT(asset_id)
FROM attribute_map
WHERE asset_id IN ( <potentially several thousand asset_id values> )
GROUP BY attribute_id
HAVING COUNT(asset_id) < <cutoff value>
ORDER BY count DESC
LIMIT <limit>
where attribute_map is a many-to-many map of asset_id to attribute_id -
demo=# \d attribute_map
Table "attribute_map"
Column | Type | Modifiers
--------------+---------+-----------
asset_id | integer |
attribute_id | integer |
Indexes: am_asset_id_idx,
am_attribute_id_idx
Unique keys: am_asset_id_attribute_id_un
Triggers: RI_ConstraintTrigger_26912,
RI_ConstraintTrigger_26906
From what i've read here, postgresql doesn't handle IN (...) queries
especially efficiently, and it looks to me like I'm being bitten by that. An
EXPLAIN ANALYZE on that query with just under 40k rows in attribute_map and
667 asset_id values in the IN (...) list returns
NOTICE: QUERY PLAN:
Limit (cost=64361.51..64361.51 rows=200 width=8) (actual
time=24431.51..24431.80 rows=200 loops=1)
-> Sort (cost=64361.51..64361.51 rows=1647 width=8) (actual
time=24431.50..24431.61 rows=201 loops=1)
-> Aggregate (cost=0.00..64273.49 rows=1647 width=8) (actual
time=55.29..24324.75 rows=1747 loops=1)
-> Group (cost=0.00..64191.13 rows=16473 width=8) (actual
time=2.38..24198.18 rows=21308 loops=1)
-> Index Scan using am_attribute_id_idx on
attribute_map (cost=0.00..64149.94 rows=16473 width=8) (actual
time=2.37..24034.97 rows=21308 loops=1)
Total runtime: 24433.20 msec
which I'm assuming means that the backend is doing a seperate index lookup
for each of the 667 asset_id values in the list.
Are there any standard idioms in postgres for getting around the poor
handling of IN (...) lists? Placing the list of asset_id values in a table
and then joining against that runs in about 1/15 the time (postgres makes a
hash of the asset_id values and then hash joins against the attribute_map)
but since my asset_id values come from outside the database that approach
would require creating and managing a whole lot of temporary tables, which
i'd rather avoid.
Much obliged for any suggestions
Russell
From | Date | Subject | |
---|---|---|---|
Next Message | Christopher Kings-Lynne | 2002-06-02 18:01:27 | Re: alternate idioms for large "IN (...)" lists |
Previous Message | D'Arcy J.M. Cain | 2002-06-01 10:35:10 | Re: problem with sql |