Re: POSTGRES DB 3 800 000 rows table, speed up?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Eugene <evgenius(at)hot(dot)ee>, pgsql-general(at)postgresql(dot)org
Subject: Re: POSTGRES DB 3 800 000 rows table, speed up?
Date: 2005-12-28 19:04:28
Message-ID: 21118.1135796668@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> On Tue, Dec 27, 2005 at 11:25:37PM +0200, Eugene wrote:
>> I ask db like this SELECT * FROM ipdb2 WHERE '3229285376' BETWEEN ipfrom
>> AND ipto;

> I'm pretty sure PostgreSQL won't be able to use any indexes for this
> (EXPLAIN ANALYZE would verify that). Instead, expand the between out:

> WHERE ipfrom >= '...' AND ipto <= '...'

That won't help (it is in fact exactly the same query, because BETWEEN
is just rewritten into that). The real problem is that btree indexes
are ill-suited to this type of condition. If the typical row has only
a small distance between ipfrom and ipto then the query is actually
pretty selective, but there is no way to capture that selectivity in
a btree search, because neither of the single-column comparisons are
selective at all. The planner realizes this and doesn't bother with
the index, instead it just does a seqscan.

You could probably get somewhere by casting the problem as an rtree
or GIST overlap/containment query, but with the currently available
tools it would be a pretty unnatural-looking query ... probably
something like
box(point(ipfrom,ipfrom),point(ipto,ipto)) ~
box(point(3229285376,3229285376),point(3229285376,3229285376))
after creating an rtree or GIST index on
box(point(ipfrom,ipfrom),point(ipto,ipto))
(haven't tried this but there is a solution lurking somewhere in this
general vicinity).

Is there a good reason why the data is stored this way, and not as
say a single "cidr" column containing subnet addresses? Querying
WHERE '192.122.252.0' << cidrcolumn
would be a much more transparent way of expressing your problem.
We don't currently have an easy indexing solution for that one either,
but we might in the future.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Qingqing Zhou 2005-12-28 19:12:54 Re: [GENERAL] Running with fsync=off
Previous Message hubert depesz lubaczewski 2005-12-28 18:50:17 Re: POSTGRES DB 3 800 000 rows table, speed up?