Skip site navigation (1) Skip section navigation (2)

Re: cube operations slower than geo_distance() on production server

From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Mark Stosberg" <mark(at)summersault(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: cube operations slower than geo_distance() on production server
Date: 2007-02-12 19:03:14
Message-ID: b42b73150702121103l297b690w84dde8ef9ef8d2f1@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 2/12/07, Mark Stosberg <mark(at)summersault(dot)com> wrote:
> Merlin Moncure wrote:
> > On 2/10/07, Mark Stosberg <mark(at)summersault(dot)com> wrote:
> >>
> >> With the help of some of this list, I was able to successfully  set up
> >> and benchmark a cube-based replacement for geo_distance() calculations.
> >>
> >> On a development box, the cube-based variations benchmarked consistently
> >> running in about 1/3 of the time of the gel_distance() equivalents.
> >>
> >> After setting up the same columns and indexes on a production
> >> database, it's a different story. All the cube operations show
> >> themselves to be about the same as, or noticeably slower than, the same
> >> operations done with geo_distance().
> >>
> >> I've stared at the EXPLAIN ANALYZE output as much I can to figure what's
> >> gone. Could you help?
> >>
> >> Here's the plan on the production server, which seems too slow. Below
> >> is the plan I get in
> >> on the development server, which is much faster.
> >>
> >> I tried "set enable_nestloop = off", which did change the plan, but
> >> the performance.
> >>
> >> The production DB has much more data in it, but I still expected
> >> comparable results relative
> >> to using geo_distance() calculations.
> >
> > any objection to posting the query (any maybe tables, keys, indexes, etc)?
>
> Here the basic query I'm using:
> SELECT
>  -- 1609.344 is a constant for "meters per mile"
>  cube_distance( (SELECT earth_coords from zipcodes WHERE zipcode =
> '90210') , earth_coords)/1609.344
>    AS RADIUS
>    FROM pets
>    -- "shelters_active" is a view where "shelter_state = 'active'"
>    JOIN shelters_active as shelters USING (shelter_id)
>    -- The zipcode fields here are varchars
>    JOIN zipcodes ON (
>         shelters.postal_code_for_joining = zipcodes.zipcode )
>    -- search for just 'dogs'
>    WHERE species_id = 1
>        AND pet_state='available'
>       AND earth_box(
>         (SELECT earth_coords from zipcodes WHERE zipcode = '90210') ,
> 10*1609.344
>       ) @ earth_coords
>    ORDER BY RADIUS;

your query looks a bit funky. here are the problems I see.

* in your field list, you don't need to re-query the zipcode table.
>  cube_distance( (SELECT earth_coords from zipcodes WHERE zipcode =
> '90210') , earth_coords)/1609.344 AS RADIUS

becomes

 cube_distance(pets.earth_coords, earth_coords ) / 1609.344 AS RADIUS

also, dont. re-refer to the zipcodes table in the join clause. you are
already joining to it:
>       AND earth_box(
>         (SELECT earth_coords from zipcodes WHERE zipcode = '90210') ,
> 10*1609.344) @ earth_coords

becomes

  AND earth_box(zipcodes.earth_coords, 10*1609.344) ) @ pets.earth_coords

* also, does pet_state have any other states than 'available' and  '
not available'? if not, you should be using a boolean. if so, you can
consider a functional index to convert it to a booelan.

* if you always look up pets by species, we can explore composite
index columns on species, available (especially using the above
functional suggestion), etc.  composite > partial (imo)

thats just to start. play with it and see what comes up.

merlin

In response to

Responses

pgsql-performance by date

Next:From: Merlin MoncureDate: 2007-02-12 19:05:18
Subject: Re: cube operations slower than geo_distance() on production server
Previous:From: Mark StosbergDate: 2007-02-12 16:11:19
Subject: Re: cube operations slower than geo_distance() on production server

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group