speed up query with max() and odd estimates

From: Matthew Nuzum <mattnuzum(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: speed up query with max() and odd estimates
Date: 2005-04-26 20:16:57
Message-ID: f3c0b40805042613167c47c7e9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have this query that takes a little over 8 min to run:
select client,max(atime) as atime from usage_access where atime >=
(select atime - '1 hour'::interval from usage_access order by atime
desc limit 1) group by client;

I think it can go a lot faster. Any suggestions on improving this? DB
is 7.3.4 I think. (There is no index on client because it is very big
and this data is used infrequently.)

explain ANALYZE select client,max(atime) as atime from usage_access
where atime >= (select atime - '1 hour'::interval from usage_access
order by atime desc limit 1) group by client;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3525096.28..3620450.16 rows=1271385 width=20)
(actual time=482676.95..482693.69 rows=126 loops=1)
InitPlan
-> Limit (cost=0.00..0.59 rows=1 width=8) (actual
time=0.40..0.41 rows=1 loops=1)
-> Index Scan Backward using usage_access_atime on
usage_access (cost=0.00..22657796.18 rows=38141552 width=8) (actual
time=0.39..0.40 rows=2 loops=1)
-> Group (cost=3525096.28..3588665.53 rows=12713851 width=20)
(actual time=482676.81..482689.29 rows=3343 loops=1)
-> Sort (cost=3525096.28..3556880.90 rows=12713851
width=20) (actual time=482676.79..482679.16 rows=3343 loops=1)
Sort Key: client
-> Seq Scan on usage_access (cost=0.00..1183396.40
rows=12713851 width=20) (actual time=482641.57..482659.18 rows=3343
loops=1)
Filter: (atime >= $0)
Total runtime: 482694.65 msec

I'm starting to understand this, which is quite frightening to me. I
thought that maybe if I shrink the number of rows down I could improve
things a bit, but my first attempt didn't work. I thought I'd replace
the "from usage_access" with this query instead:
select * from usage_access where atime >= (select atime - '1
hour'::interval from usage_access order by atime desc limit 1);

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on usage_access (cost=0.00..1183396.40 rows=12713851
width=116) (actual time=481796.22..481839.43 rows=3343 loops=1)
Filter: (atime >= $0)
InitPlan
-> Limit (cost=0.00..0.59 rows=1 width=8) (actual
time=0.41..0.42 rows=1 loops=1)
-> Index Scan Backward using usage_access_atime on
usage_access (cost=0.00..22657796.18 rows=38141552 width=8) (actual
time=0.40..0.41 rows=2 loops=1)
Total runtime: 481842.47 msec

It doesn't look like this will help at all.

This table is primarily append, however I just recently deleted a few
million rows from the table, if that helps anyone.

--
Matthew Nuzum
www.bearfruit.org

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2005-04-26 20:31:38 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Previous Message Roger Hand 2005-04-26 19:52:53 Re: What needs to be done for real Partitioning?