Re: performance of implicit join vs. explicit conditions on inet queries?

From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance of implicit join vs. explicit conditions on inet queries?
Date: 2005-10-31 09:48:41
Message-ID: dk4p9d$30q4$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Robert Edmonds" <edmonds42(at)bellsouth(dot)net> wrote
>
> EXPLAIN ANALYZE
> SELECT *
> FROM inet_addresses
> WHERE addr << inet('10.2.0.0/24')
> OR addr << inet('10.4.0.0/24')
> OR addr << inet('10.8.0.0/24');
>
> Bitmap Heap Scan on inet_addresses (cost=6.51..324.48 rows=1792335
> width=11) (actual time=0.350..1.104 rows=381 loops=1)
> Recheck Cond: ((addr << '10.2.0.0/24'::inet) OR (addr <<
> '10.4.0.0/24'::inet) OR (addr << '10.8.0.0/24'::inet))
> Filter: ((addr << '10.2.0.0/24'::inet) OR (addr << '10.4.0.0/24'::inet)
> OR (addr << '10.8.0.0/24'::inet))
> -> BitmapOr (cost=6.51..6.51 rows=85 width=0) (actual
> time=0.336..0.336 rows=0 loops=1)
> -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17
> rows=28 width=0) (actual time=0.127..0.127 rows=127 loops=1)
> Index Cond: ((addr > '10.2.0.0/24'::inet) AND (addr <=
> '10.2.0.255'::inet))
> -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17
> rows=28 width=0) (actual time=0.109..0.109 rows=127 loops=1)
> Index Cond: ((addr > '10.4.0.0/24'::inet) AND (addr <=
> '10.4.0.255'::inet))
> -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17
> rows=28 width=0) (actual time=0.096..0.096 rows=127 loops=1)
> Index Cond: ((addr > '10.8.0.0/24'::inet) AND (addr <=
> '10.8.0.255'::inet))
> Total runtime: 1.613 ms
>
>
> Instead of specifying explicit address ranges in the query, I'd like
> to store the ranges in a table:
>
>
> inet_test_db=# \d inet_ranges
> Table "public.inet_ranges"
> Column | Type | Modifiers
> ----------+---------+-----------
> range | inet | not null
> range_id | integer |
> Indexes:
> "inet_ranges_pkey" PRIMARY KEY, btree (range)
> "inet_ranges_range_id_idx" btree (range_id)
>
> inet_test_db=# SELECT * FROM inet_ranges;
> range | range_id
> --------------+----------
> 10.2.0.0/24 | 1
> 10.4.0.0/24 | 1
> 10.8.0.0/24 | 1
> 10.16.0.0/24 | 2
> 10.32.0.0/24 | 2
> 10.64.0.0/24 | 2
> (6 rows)
>
>
>
> EXPLAIN ANALYZE
> SELECT *
> FROM inet_addresses as ia, inet_ranges as ir
> WHERE ia.addr << ir.range
> AND ir.range_id=1;
>
> Nested Loop (cost=0.00..171485.93 rows=3072574 width=26) (actual
> time=1465.803..16922.979 rows=381 loops=1)
> Join Filter: ("inner".addr << "outer".range)
> -> Seq Scan on inet_ranges ir (cost=0.00..1.07 rows=3 width=15)
> (actual time=0.008..0.021 rows=3 loops=1)
> Filter: (range_id = 1)
> -> Seq Scan on inet_addresses ia (cost=0.00..31556.83 rows=2048383
> width=11) (actual time=0.003..2919.405 rows=2048383 loops=3)
> Total runtime: 16923.457 ms
>

Good illustration. I guess we have a problem of the historgram statistical
information. That is, the historgrams we used can effectively record the
linear space ranges(like ordinary <, >, =), but failed to do it for
nonlinear ranges like inet data type. So the Nested Loop node make an error
in estmating number of rows (est: 3072574, real: 381), thus a sequential
scan is obviously better under this estimation.

I am thinking the historgram problem is not easy to fix, but is there a way
to change Inet type a little bit to make it linear for your range operators?
(for example, align the length to 000.000.000.000/00?)

Regards,
Qingqing

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Roussel 2005-10-31 13:43:12 Re: Best way to check for new data.
Previous Message Martin Lesser 2005-10-30 20:16:20 Re: Effects of cascading references in foreign keys