reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)

From: Mark Stosberg <mark(at)summersault(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)
Date: 2007-02-14 16:28:38
Message-ID: eqvdbm$1c8p$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Merlin Moncure wrote:
> On 2/14/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There are two things wrong here: first, that the estimated row
>> count is only 20% of actual; it should certainly not be that far
>> off for such a simple condition. I wonder if your vacuum/analyze
>> procedures are actually working. Second, you mentioned somewhere
>> along the line that 'available' pets are about 10% of all the
>> entries, which means that this indexscan is more than likely
>> entirely counterproductive: it would be cheaper to ignore this
>> index altogether.

Tom,

Thanks for the generosity of your time. We are using 8.1.3 currently. I
have read there are some performance improvements in 8.2, but we have
not started evaluating that yet.

Your suggestion about the pet_state index was right on. I tried
"Analyze" on it, but still got the same bad estimate. However, I then
used "reindex" on that index, and that fixed the estimate accuracy,
which made the query run faster! The cube search now benchmarks faster
than the old search in production, taking about 2/3s of the time of the
old one.

Any ideas why the manual REINDEX did something that "analyze" didn't? It
makes me wonder if there is other tuning like this to do.

Attached is the EA output from the most recent run, after the "re-index".

> I think switching the index on pet_state to a composite on (pet_state,
> species_id) might help too.
>
> or even better:
>
> create function is_pet_available(text) returns bool as
> $$
> select $1='available';
> $$ language sql immutable;
>
> create index pets_available_species_idx on
> pets(is_pet_available(pet_state), species_id);

Merlin,

Thanks for this suggestion. It is not an approach I had used before, and
I was interested to try it. However, the new index didn't get chosen.
(Perhaps I would need to drop the old one?) However, Tom's suggestions
did help. I'll follow up on that in just a moment.

>
> refactor your query something similar to:
>
> SELECT * FROM
> (
> SELECT
> earth_coords(q.earth_coords, s.earth_coords)/1609.344 as radius
> FROM pets
> JOIN shelters_active as shelters USING (shelter_id)
> JOIN zipcodes s ON shelters.postal_code_for_joining = zipcodes.zipcode
> JOIN zipcodes q ON q.zipcode = '90210'
> WHERE
> is_pet_available(pet_state)
> AND species_id = 1
> AND earth_box(q.earth_coords, 10*1609.344) @ s.earth_coords
> ) p order by radius
>
> merlin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Attachment Content-Type Size
ea.txt text/plain 2.5 KB

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kenji Morishige 2007-02-14 17:25:01 Re: quad or dual core Intel CPUs
Previous Message Merlin Moncure 2007-02-14 16:20:53 Re: Benchmarking PGSQL?