Re: Possible infinite loop in query using bitmap scans

From: Casey Duncan <casey(at)pandora(dot)com>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Possible infinite loop in query using bitmap scans
Date: 2006-03-14 01:25:49
Message-ID: cb63857e8cb866fca0b5daa088322c12@pandora.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Mar 13, 2006, at 9:50 AM, Michael Fuhr wrote:

> On Sun, Mar 12, 2006 at 11:36:23PM -0800, Casey Duncan wrote:
>> SELECT count(*) FROM webhits
>> WHERE path LIKE '/radio/tuner_%.swf' AND status = 200
>> AND date_recorded >= '3/10/2006'::TIMESTAMP
>> AND date_recorded < '3/11/2006'::TIMESTAMP;
> [...]
>> Aggregate (cost=794775.08..794775.09 rows=1 width=0)
> [...]
>> According to the planner it should take <15 minutes which is typical
>> in
>> practice.
>
> The planner's cost estimate is in units of disk page fetches, not
> time. The above estimate isn't 794775.09 ms (~13.25 min) but rather
> 794775.09 times the cost of a single page fetch, however much that
> is. See "Using EXPLAIN" in the "Performance Tips" chapter of the
> documentation.
>
> http://www.postgresql.org/docs/8.1/interactive/performance-
> tips.html#USING-EXPLAIN

Doh! I should've known that.

>> About half the times it runs, however, it never terminates
>> (even after days) and just spins consuming 99+% of CPU with no disk
>> activity. This query was never a problem in postgres versions < 8.1.2,
>> however the data has grown substantially since that time. I notice it
>> uses the recent in-memory bitmap feature, so I wondered if it was
>> exposing a bug.
>
> If the problem happens half the time then you have a somewhat
> repeatable test case. Do you get more consistent performance if
> you set enable_bitmapscan to off? What's the query plan if you do
> that?

Here's the plan with bitmap scans off:


QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------
Aggregate (cost=891363.71..891363.72 rows=1 width=0)
-> Index Scan using webhits_date_idx1 on webhits
(cost=0.00..891360.30 rows=1362 width=0)
Index Cond: ((date_recorded >= '2006-03-10
00:00:00'::timestamp without time zone) AND (date_recorded <
'2006-03-11 00:00:00'::timestamp without time zone))
Filter: (((path)::text ~~ '/radio/tuner_%.swf'::text) AND
(status = 200))
(4 rows)

The query runs to completion this way in about 40 minutes. I turned
bitmap scans back on and it hangs again (I ran it for about 5 hours).

> If you narrow the search criteria so the query returns fewer rows,
> do you still see the problem? Can you identify a "sour spot" where
> the problem starts to happen?

I'll do that tomorrow, and let you know.

>> If I restart the postmaster, the query will complete in the expected
>> time.
>
> Does the problem eventually start happening again? If so, after
> how long? How did you determine that the restart is relevant? Do
> you consistently see different (presumably better) performance after
> a restart than if you don't restart?

This is a production box, so I can't restart it whenever I want and I
haven't yet reproduced it elsewhere -- the data base size makes that
cumbersome at best -- but once after it hung up, I restarted postgres
and the report ran to completion for a couple of days then started
hanging again. Today it seems to pretty consistently hang, I'll see if
I can restart it overnight and test it again.

-Casey

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2006-03-14 01:38:21 Re: Relation 'pg_largeobject' does not exist
Previous Message Brandon Keepers 2006-03-14 01:24:26 Relation 'pg_largeobject' does not exist