Re: more problems with count(*) on large table

From: Mike Charnoky <noky(at)nextbus(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Alban Hertroys <a(dot)hertroys(at)magproductions(dot)nl>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "A(dot) Kretschmer" <andreas(dot)kretschmer(at)schollglas(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: more problems with count(*) on large table
Date: 2007-10-01 17:00:21
Message-ID: 470127A5.1060505@nextbus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

This is strange... count(*) operations over a period of one day's worth
of data now take ~1-2 minutes to run or ~40 minutes. It seems that the
first time the data is queried it takes about 40 minutes. If I try the
query again, it finishes in 1-2 minutes!

Again, nothing else is happening on this db server except for a constant
insertion into this table and a few others. I have done "set statistics
100" for the evtime field in this table.

Here is the output from EXPLAIN ANALYZE. This is the same query run
back to back, first time takes 42 minutes, second time takes less than 2
minutes!

mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3.02..3.03 rows=1 width=0) (actual
time=2549854.351..2549854.352 rows=1 loops=1)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892
rows=11423786 loops=1)
Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
Total runtime: 2549854.411 ms
(4 rows)

Time: 2549943.506 ms
mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3.02..3.03 rows=1 width=0) (actual
time=111200.943..111200.944 rows=1 loops=1)
-> Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483
rows=11423786 loops=1)
Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
Total runtime: 111201.000 ms
(4 rows)

Time: 111298.695 ms

Mike

Gregory Stark wrote:
> "Mike Charnoky" <noky(at)nextbus(dot)com> writes:
>
>> I altered the table in question, with "set statistics 100" on the
>> timestamp column, then ran analyze. This seemed to help somewhat. Now,
>> queries don't seem to hang, but it still takes a long time to do the count:
>> * "where evtime between '2007-09-26' and '2007-09-27'"
>> took 37 minutes to run (result was ~12 million)
>> * "where evtime between '2007-09-25' and '2007-09-26'"
>> took 40 minutes to run (result was ~14 million)
>>
>> Still stymied about the seemingly random performance, especially since I
>> have seen this query execute in 2 minutes.
>
>
> And the "explain analyze" for these?
>
> Are you still sure it's certain date ranges which are consistently problems
> and others are consistently fast? Or could it be something unrelated.
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alan Hodgson 2007-10-01 17:12:05 Re: more problems with count(*) on large table
Previous Message joynes 2007-10-01 16:56:28 Find out encoding of data