Re: POSTGRES DB 3 800 000 rows table, speed up?

From: Klein Balázs <Balazs(dot)Klein(at)axelero(dot)hu>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: POSTGRES DB 3 800 000 rows table, speed up?
Date: 2005-12-29 02:24:36
Message-ID: 20051229022442.0E6EE3E3962@graveyard.mail.t-online.hu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Could you explain this a little bit more?
What are the conditions of this situation that makes b-tree ineffective?

Thanks
SWK

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org
[mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
Sent: 2005. december 28. 20:04
To: Jim C. Nasby
Cc: Eugene; pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] POSTGRES DB 3 800 000 rows table, speed up?

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> On Tue, Dec 27, 2005 at 11:25:37PM +0200, Eugene wrote:
>> I ask db like this SELECT * FROM ipdb2 WHERE '3229285376' BETWEEN ipfrom

>> AND ipto;

> I'm pretty sure PostgreSQL won't be able to use any indexes for this
> (EXPLAIN ANALYZE would verify that). Instead, expand the between out:

> WHERE ipfrom >= '...' AND ipto <= '...'

That won't help (it is in fact exactly the same query, because BETWEEN
is just rewritten into that). The real problem is that btree indexes
are ill-suited to this type of condition. If the typical row has only
a small distance between ipfrom and ipto then the query is actually
pretty selective, but there is no way to capture that selectivity in
a btree search, because neither of the single-column comparisons are
selective at all. The planner realizes this and doesn't bother with
the index, instead it just does a seqscan.

You could probably get somewhere by casting the problem as an rtree
or GIST overlap/containment query, but with the currently available
tools it would be a pretty unnatural-looking query ... probably
something like
box(point(ipfrom,ipfrom),point(ipto,ipto)) ~
box(point(3229285376,3229285376),point(3229285376,3229285376))
after creating an rtree or GIST index on
box(point(ipfrom,ipfrom),point(ipto,ipto))
(haven't tried this but there is a solution lurking somewhere in this
general vicinity).

Is there a good reason why the data is stored this way, and not as
say a single "cidr" column containing subnet addresses? Querying
WHERE '192.122.252.0' << cidrcolumn
would be a much more transparent way of expressing your problem.
We don't currently have an easy indexing solution for that one either,
but we might in the future.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
message can get through to the mailing list cleanly

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jonel Rienton 2005-12-29 02:44:21 Re: [Bulk] Re: Final stored procedure question, for now anyway
Previous Message xiapw 2005-12-29 02:09:01 I want to know how to improve the security of postgresql