Re: Simple query, 10 million records...MySQL ten times faster

From: "Alexander Staubo" <alex(at)purefiction(dot)net>
To: zardozrocks <zardozrocks(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Simple query, 10 million records...MySQL ten times faster
Date: 2007-04-26 23:12:31
Message-ID: 88daf38c0704261612o741b67bvd380484cb38b814d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 24 Apr 2007 14:26:46 -0700, zardozrocks <zardozrocks(at)gmail(dot)com> wrote:
> I have this table:
>
> CREATE TABLE test_zip_assoc (
> id serial NOT NULL,
> f_id integer DEFAULT 0 NOT NULL,
> lat_radians numeric(6,5) DEFAULT 0.00000 NOT NULL,
> long_radians numeric(6,5) DEFAULT 0.00000 NOT NULL
> );
> CREATE INDEX lat_radians ON test_zip_assoc USING btree (lat_radians);
> CREATE INDEX long_radians ON test_zip_assoc USING btree
> (long_radians);

This is a spatial search -- B-tree indexes are much less efficient
than this than certain other data structures. The R-tree and its many
variants are based on subdividing the space in regions, allowing you
to do efficient checks on containment, intersection, etc., based on
points or bounding boxes.

PostgreSQL implements R-trees natively as well as through a mechanism
called GiST, a framework for implementing pluggable tree-like indexes.
It also provides some geometric data types. However, as far as I know,
PostgreSQL's R-tree/GiST indexes do *not* provide the operators to do
bounding box searches. For this you need PostGIS.

PostGIS implements the whole GIS stack, and it's so good at this that
it's practically the de facto tool among GIS analysts. Installing
PostGIS into a database is simple, and once you have done this, you
can augment your table with a geometry (*):

alter table test_zip_assoc add column lonlat geometry;
update test_zip_assoc set lonlat = makepoint(
long_radians / (3.14159265358979 / 180),
lat_radians / (3.14159265358979 / 180));

The division is to convert your radians into degrees; PostGIS works
with degrees, at least out of the box.

Now you can query on a bounding box (although, are you sure you got
your lons and lats in order? That's Antarctica, isn't it?):

select * from test_zip_assoc
where lonlat && makebox2d(
makepoint(-77.6049721697096, 39.5425768302107),
makepoint(-76.1592790300818, 40.9882699698386))

This is bound to be blazingly fast. Next you can order by geographic
distance if you like:

order by distance_sphere(lonlat,
makepoint(-77.6049721697096, 39.5425768302107))

Nobody has mentioned PostGIS so far, so I hope I'm not missing some
crucial detail, like "no spatial indexes allowed!".

(*) I cheated here. The PostGIS manual recommends that you use a
function to create geometric column, because it will set up some
auxilary data structures for you that are needed for certain
operations. The recommended syntax is:

select AddGeometryColumn('', 'test_zip_assoc', 'geom', -1, 'POINT', 2);

Alexander.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alexander Staubo 2007-04-26 23:14:34 Re: Simple query, 10 million records...MySQL ten times faster
Previous Message Scott Marlowe 2007-04-26 22:50:27 Re: postgres: 100% CPU utilization