Re: cube operations slower than geo_distance() on production server

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Stosberg <mark(at)summersault(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: cube operations slower than geo_distance() on production server
Date: 2007-02-14 07:15:44
Message-ID: 18714.1171437344@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Mark Stosberg <mark(at)summersault(dot)com> writes:
> For reference, here's two versions of the query. The first uses
> the old geo_distance(), and the second one is the new cube query I'm
> trying, inspired by your suggested refactoring.

You didn't show EXPLAIN ANALYZE output :-(

Looking back in the thread, the last E.A. output I see is in your
message of 2/12 11:11, and the striking thing there is that it seems all
the time is going into one indexscan:

-> Bitmap Index Scan on pets_pet_state_idx (cost=0.00..562.50 rows=39571 width=0) (actual time=213.620..213.620 rows=195599 loops=82)
Index Cond: ((pet_state)::text = 'available'::text)
Total runtime: 17933.675 ms

213.620 * 82 = 17516.840, so this step is all but 400msec of the run.

There are two things wrong here: first, that the estimated row count is
only 20% of actual; it should certainly not be that far off for such a
simple condition. I wonder if your vacuum/analyze procedures are
actually working. Second, you mentioned somewhere along the line that
'available' pets are about 10% of all the entries, which means that this
indexscan is more than likely entirely counterproductive: it would be
cheaper to ignore this index altogether.

Suggestions:

1. ANALYZE the table by hand, try the explain again and see if this
rowcount estimate gets better. If so, you need to look into why your
existing procedures aren't keeping the stats up to date.

2. If, with a more accurate rowcount estimate, the planner still wants
to use this index, try discouraging it. Brute force would be to drop
the index. If there are less-common pet_states that are actually worth
searching for, maybe keep the index but change it to a partial index
WHERE pet_state != 'available'.

Also, I don't see that you mentioned anywhere what PG version you are
running, but if it's not the latest then an update might help. I recall
having fixed a bug that made the planner too eager to AND on an index
that wouldn't actually help much ... which seems to fit this problem
description pretty well.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Brian Herlihy 2007-02-14 08:40:13 An unwanted seqscan
Previous Message Bruce Momjian 2007-02-14 05:01:27 Re: [PERFORM] Direct I/O issues