Slow join using network address function

From: "Eric Jain" <Eric(dot)Jain(at)isb-sib(dot)ch>
To: "pgsql-performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Slow join using network address function
Date: 2004-02-23 11:48:02
Message-ID: 000701c3fa02$e4e3d180$c300000a@caliente
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'm trying to join two tables on an inet column, where one of the
columns may contain a subnet rather than a single host. Somehow the
operation isn't completing quite fast enough, even though neither table
is very large:

table | rows
--------------------+--------
clients | 115472
clients_commercial | 11670

First attempt, cancelled after running for half an hour:

SELECT
c.address AS address,
cc.address AS network
FROM
clients c
JOIN clients_commercial cc ON (c.address <<= cc.address)
;

Nested Loop
(cost=189.00..27359887.76 rows=607947200 width=22)
Join Filter: ("outer".address <<= "inner".address)
-> Seq Scan on clients c
(cost=0.00..2074.76 rows=102176 width=11)
-> Materialize
(cost=189.00..308.00 rows=11900 width=11)
-> Seq Scan on clients_commercial cc
(cost=0.00..189.00 rows=11900 width=11)

Second attempt, completes within 10 min:

SELECT
c.address AS address,
cc.address AS network
FROM
clients c,
clients_commercial cc
WHERE
c.commercial IS NULL
AND c.address <<= cc.address
;

Nested Loop
(cost=189.00..139084.01 rows=3040450 width=22)
Join Filter: ("outer".address <<= "inner".address)
-> Seq Scan on clients c
(cost=0.00..2074.76 rows=511 width=11)
Filter: (commercial IS NULL)
-> Materialize
(cost=189.00..308.00 rows=11900 width=11)
-> Seq Scan on clients_commercial cc
(cost=0.00..189.00 rows=11900 width=11)

Third attempt; provided some indexes, which unfortunately don't get
used, making the query twice as slow as the previous one:

SELECT
c.address AS address,
cc.address AS network
FROM
clients c,
clients_commercial cc
WHERE
c.commercial IS NULL
AND set_masklen(c.address, masklen(cc.address)) = cc.address
;

CREATE INDEX clients_commercial_masklen_idx
ON clients_commercial((masklen(address)));

CREATE INDEX clients_32_idx
ON clients((set_masklen(address, 32)));

CREATE INDEX clients_24_idx
ON clients((set_masklen(address, 24)));

CREATE INDEX clients_16_idx
ON clients((set_masklen(address, 16)));

Nested Loop
(cost=189.00..169488.51 rows=479 width=22)
Join Filter: (set_masklen("outer".address, masklen("inner".address))
= "inner".address)
-> Seq Scan on clients c
(cost=0.00..2074.76 rows=511 width=11)
Filter: (commercial IS NULL)
-> Materialize
(cost=189.00..308.00 rows=11900 width=11)
-> Seq Scan on clients_commercial cc
(cost=0.00..189.00 rows=11900 width=11)

Anything else I could try? BTREE indexes don't seem to work with the <<=
operator; is this not possible in principal, or simply something that
has not been implmented yet?

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Steve Atkins 2004-02-23 16:07:34 Re: Slow join using network address function
Previous Message Tom Lane 2004-02-23 03:38:40 Re: Column correlation drifts, index ignored again