performance of implicit join vs. explicit conditions on inet queries?

From: Robert Edmonds <edmonds42(at)bellsouth(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: performance of implicit join vs. explicit conditions on inet queries?
Date: 2005-10-24 03:58:09
Message-ID: 20051024035809.GA18261@edmonds.ath.cx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The preliminaries:

- PostgreSQL 8.1 beta 3, Debian experimental
- database has been VACUUMed FULL ANALYZE.
- a pg_dump -Fc exists at http://199.77.129.48/inet_test.db
- ia32 hardware with 2 GB physical memory and the following settings:

shared_buffers = 40960
temp_buffers = 16384
work_mem = 131072
maintenance_work_mem = 262144
effective_cache_size = 65536

I've populated a table evenly with about 2 million rows of RFC 1918
addresses:

Table "public.inet_addresses"
Column | Type | Modifiers
--------+------+-----------
addr | inet | not null
Indexes:
"inet_addresses_pkey" PRIMARY KEY, btree (addr)

The following query is very fast:

EXPLAIN ANALYZE
SELECT *
FROM inet_addresses
WHERE addr << inet('10.2.0.0/24')
OR addr << inet('10.4.0.0/24')
OR addr << inet('10.8.0.0/24');

Bitmap Heap Scan on inet_addresses (cost=6.51..324.48 rows=1792335 width=11) (actual time=0.350..1.104 rows=381 loops=1)
Recheck Cond: ((addr << '10.2.0.0/24'::inet) OR (addr << '10.4.0.0/24'::inet) OR (addr << '10.8.0.0/24'::inet))
Filter: ((addr << '10.2.0.0/24'::inet) OR (addr << '10.4.0.0/24'::inet) OR (addr << '10.8.0.0/24'::inet))
-> BitmapOr (cost=6.51..6.51 rows=85 width=0) (actual time=0.336..0.336 rows=0 loops=1)
-> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 rows=28 width=0) (actual time=0.127..0.127 rows=127 loops=1)
Index Cond: ((addr > '10.2.0.0/24'::inet) AND (addr <= '10.2.0.255'::inet))
-> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 rows=28 width=0) (actual time=0.109..0.109 rows=127 loops=1)
Index Cond: ((addr > '10.4.0.0/24'::inet) AND (addr <= '10.4.0.255'::inet))
-> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 rows=28 width=0) (actual time=0.096..0.096 rows=127 loops=1)
Index Cond: ((addr > '10.8.0.0/24'::inet) AND (addr <= '10.8.0.255'::inet))
Total runtime: 1.613 ms

Instead of specifying explicit address ranges in the query, I'd like
to store the ranges in a table:

inet_test_db=# \d inet_ranges
Table "public.inet_ranges"
Column | Type | Modifiers
----------+---------+-----------
range | inet | not null
range_id | integer |
Indexes:
"inet_ranges_pkey" PRIMARY KEY, btree (range)
"inet_ranges_range_id_idx" btree (range_id)

inet_test_db=# SELECT * FROM inet_ranges;
range | range_id
--------------+----------
10.2.0.0/24 | 1
10.4.0.0/24 | 1
10.8.0.0/24 | 1
10.16.0.0/24 | 2
10.32.0.0/24 | 2
10.64.0.0/24 | 2
(6 rows)

This query is far slower, even though it generates the same result:

EXPLAIN ANALYZE
SELECT *
FROM inet_addresses as ia, inet_ranges as ir
WHERE ia.addr << ir.range
AND ir.range_id=1;

Nested Loop (cost=0.00..171485.93 rows=3072574 width=26) (actual time=1465.803..16922.979 rows=381 loops=1)
Join Filter: ("inner".addr << "outer".range)
-> Seq Scan on inet_ranges ir (cost=0.00..1.07 rows=3 width=15) (actual time=0.008..0.021 rows=3 loops=1)
Filter: (range_id = 1)
-> Seq Scan on inet_addresses ia (cost=0.00..31556.83 rows=2048383 width=11) (actual time=0.003..2919.405 rows=2048383 loops=3)
Total runtime: 16923.457 ms

Even when disabling sequential scans, the query planner is unable to
make use of the inet_addresses_pkey index:

Nested Loop (cost=100033605.21..100171874.11 rows=3072574 width=26) (actual time=2796.928..23453.585 rows=381 loops=1)
Join Filter: ("inner".addr << "outer".range)
-> Index Scan using inet_ranges_range_id_idx on inet_ranges ir (cost=0.00..3.04 rows=3 width=15) (actual time=0.069..0.095 rows=3 loops=1)
Index Cond: (range_id = 1)
-> Materialize (cost=100033605.21..100054089.04 rows=2048383 width=11) (actual time=0.016..5133.349 rows=2048383 loops=3)
-> Seq Scan on inet_addresses ia (cost=100000000.00..100031556.83 rows=2048383 width=11) (actual time=0.005..2938.012 rows=2048383 loops=1)
Total runtime: 23521.418 ms

Is it possible to attain the speed of the first query and the
flexibility of the second? Or will I have to resort to generating
queries of the first form with the range table in the application
layer?

--
Robert Edmonds
edmonds42(at)bellsouth(dot)net

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2005-10-24 04:36:35 Re: Using LIMIT 1 in plpgsql PERFORM statements
Previous Message Mark Kirkwood 2005-10-24 03:14:20 Re: Used Memory