Skip site navigation (1) Skip section navigation (2)

reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)

From: Mark Stosberg <mark(at)summersault(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)
Date: 2007-02-14 16:28:38
Message-ID: eqvdbm$1c8p$1@news.hub.org (view raw or flat)
Thread:
Lists: pgsql-performance
Merlin Moncure wrote:
> On 2/14/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There are two things wrong here: first, that the estimated row
>> count is only 20% of actual; it should certainly not be that far
>> off for such a simple condition. I wonder if your vacuum/analyze
>> procedures are actually working. Second, you mentioned somewhere
>> along the line that 'available' pets are about 10% of all the
>> entries, which means that this indexscan is more than likely
>> entirely counterproductive: it would be cheaper to ignore this
>> index altogether.

Tom,

Thanks for the generosity of your time. We are using  8.1.3 currently. I 
have read there are some performance improvements in 8.2, but we have 
not started evaluating that yet.

Your suggestion about the pet_state index was right on. I tried 
"Analyze" on it, but still got the same bad estimate. However, I then 
used "reindex" on that index, and that fixed the estimate accuracy, 
which made the query run faster! The cube search now benchmarks faster 
than the old search in production, taking about 2/3s of the time of the 
old one.

Any ideas why the manual REINDEX did something that "analyze" didn't? It 
makes me wonder if there is other tuning like this to do.

Attached is the EA output from the most recent run, after the "re-index".

> I think switching the index on pet_state to a composite on (pet_state,
> species_id) might help too.
> 
> or even better:
> 
> create function is_pet_available(text) returns bool as
> $$
>  select $1='available';
> $$ language sql immutable;
> 
> create index pets_available_species_idx on
> pets(is_pet_available(pet_state), species_id);

Merlin,

Thanks for this suggestion. It is not an approach I had used before, and 
I was interested to try it. However, the new index didn't get chosen. 
(Perhaps I would need to drop the old one?) However, Tom's suggestions 
did help. I'll follow up on that in just a moment.

> 
> refactor your query something similar to:
> 
> SELECT * FROM
> (
> SELECT
> earth_coords(q.earth_coords, s.earth_coords)/1609.344 as radius
> FROM pets
> JOIN shelters_active as shelters USING (shelter_id)
> JOIN zipcodes s ON shelters.postal_code_for_joining = zipcodes.zipcode
> JOIN zipcodes q ON q.zipcode = '90210'
> WHERE
>   is_pet_available(pet_state)
>   AND species_id = 1
>   AND earth_box(q.earth_coords, 10*1609.344) @ s.earth_coords
> ) p order by radius
> 
> merlin
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend


Attachment: ea.txt
Description: text/plain (2.5 KB)

In response to

Responses

pgsql-performance by date

Next:From: Kenji MorishigeDate: 2007-02-14 17:25:01
Subject: Re: quad or dual core Intel CPUs
Previous:From: Merlin MoncureDate: 2007-02-14 16:20:53
Subject: Re: Benchmarking PGSQL?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group