Re: cube operations slower than geo_distance() on production server

From: Mark Stosberg <mark(at)summersault(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: cube operations slower than geo_distance() on production server
Date: 2007-02-12 16:11:19
Message-ID: eqq3k3$1mpu$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Merlin Moncure wrote:
> On 2/10/07, Mark Stosberg <mark(at)summersault(dot)com> wrote:
>>
>> With the help of some of this list, I was able to successfully set up
>> and benchmark a cube-based replacement for geo_distance() calculations.
>>
>> On a development box, the cube-based variations benchmarked consistently
>> running in about 1/3 of the time of the gel_distance() equivalents.
>>
>> After setting up the same columns and indexes on a production
>> database, it's a different story. All the cube operations show
>> themselves to be about the same as, or noticeably slower than, the same
>> operations done with geo_distance().
>>
>> I've stared at the EXPLAIN ANALYZE output as much I can to figure what's
>> gone. Could you help?
>>
>> Here's the plan on the production server, which seems too slow. Below
>> is the plan I get in
>> on the development server, which is much faster.
>>
>> I tried "set enable_nestloop = off", which did change the plan, but
>> the performance.
>>
>> The production DB has much more data in it, but I still expected
>> comparable results relative
>> to using geo_distance() calculations.
>
> any objection to posting the query (any maybe tables, keys, indexes, etc)?

Here the basic query I'm using:
SELECT
-- 1609.344 is a constant for "meters per mile"
cube_distance( (SELECT earth_coords from zipcodes WHERE zipcode =
'90210') , earth_coords)/1609.344
AS RADIUS
FROM pets
-- "shelters_active" is a view where "shelter_state = 'active'"
JOIN shelters_active as shelters USING (shelter_id)
-- The zipcode fields here are varchars
JOIN zipcodes ON (
shelters.postal_code_for_joining = zipcodes.zipcode )
-- search for just 'dogs'
WHERE species_id = 1
AND pet_state='available'
AND earth_box(
(SELECT earth_coords from zipcodes WHERE zipcode = '90210') ,
10*1609.344
) @ earth_coords
ORDER BY RADIUS;

All the related columns are indexed:
pets.species_id
pets.shelter_id
pets.pet_state

shelters.shelter_id (pk)
shelters.postal_code_for_joining
shelters.active

zipcodes.zipcode (pk)
zipcodes.earth_coords

The pets table has about 300,000 rows, but only about 10% are
"available". It sees regular updates and is "vacuum analyzed" every
couple of hours now. the rest of the tables get "vacuum analyzed
nightly". The shelters table is about 99% "shelter_state = active".
It's updated infrequently.

The zipcodes table has about 40,000 rows in it and doesn't change.

I tried a partial index on the pets table "WHERE pet_state =
'available'. I could see the index was used, but the performance was
unaffected.

The "EXPLAIN ANALYZE" output is attached, to try to avoid mail-client
wrapping. The query is running 10 times slower today than on Friday,
perhaps because of server load, or because we are at the end of a VACUUM
cycle.

Thanks for any help!

Mark

Attachment Content-Type Size
analyze.txt text/plain 2.3 KB

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2007-02-12 19:03:14 Re: cube operations slower than geo_distance() on production server
Previous Message Merlin Moncure 2007-02-10 01:31:34 Re: cube operations slower than geo_distance() on production server