Skip site navigation (1) Skip section navigation (2)

speed up query with max() and odd estimates

From: Matthew Nuzum <mattnuzum(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: speed up query with max() and odd estimates
Date: 2005-04-26 20:16:57
Message-ID: f3c0b40805042613167c47c7e9@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
I have this query that takes a little over 8 min to run:
select client,max(atime) as atime from usage_access where atime >=
(select atime - '1 hour'::interval from usage_access order by atime
desc limit 1) group by client;

I think it can go a lot faster. Any suggestions on improving this? DB
is 7.3.4 I think. (There is no index on client because it is very big
and this data is used infrequently.)

explain ANALYZE select client,max(atime) as atime from usage_access
where atime >= (select atime - '1 hour'::interval from usage_access
order by atime desc limit 1) group by client;
                                                                      
      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3525096.28..3620450.16 rows=1271385 width=20)
(actual time=482676.95..482693.69 rows=126 loops=1)
   InitPlan
     ->  Limit  (cost=0.00..0.59 rows=1 width=8) (actual
time=0.40..0.41 rows=1 loops=1)
           ->  Index Scan Backward using usage_access_atime on
usage_access  (cost=0.00..22657796.18 rows=38141552 width=8) (actual
time=0.39..0.40 rows=2 loops=1)
   ->  Group  (cost=3525096.28..3588665.53 rows=12713851 width=20)
(actual time=482676.81..482689.29 rows=3343 loops=1)
         ->  Sort  (cost=3525096.28..3556880.90 rows=12713851
width=20) (actual time=482676.79..482679.16 rows=3343 loops=1)
               Sort Key: client
               ->  Seq Scan on usage_access  (cost=0.00..1183396.40
rows=12713851 width=20) (actual time=482641.57..482659.18 rows=3343
loops=1)
                     Filter: (atime >= $0)
 Total runtime: 482694.65 msec


I'm starting to understand this, which is quite frightening to me. I
thought that maybe if I shrink the number of rows down I could improve
things a bit, but my first attempt didn't work. I thought I'd replace
the "from usage_access" with this query instead:
select * from usage_access where atime >= (select atime - '1
hour'::interval from usage_access order by atime desc limit 1);
                                                                      
      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on usage_access  (cost=0.00..1183396.40 rows=12713851
width=116) (actual time=481796.22..481839.43 rows=3343 loops=1)
   Filter: (atime >= $0)
   InitPlan
     ->  Limit  (cost=0.00..0.59 rows=1 width=8) (actual
time=0.41..0.42 rows=1 loops=1)
           ->  Index Scan Backward using usage_access_atime on
usage_access  (cost=0.00..22657796.18 rows=38141552 width=8) (actual
time=0.40..0.41 rows=2 loops=1)
 Total runtime: 481842.47 msec

It doesn't look like this will help at all.

This table is primarily append, however I just recently deleted a few
million rows from the table, if that helps anyone.

-- 
Matthew Nuzum
www.bearfruit.org

Responses

pgsql-performance by date

Next:From: Simon RiggsDate: 2005-04-26 20:31:38
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Previous:From: Roger HandDate: 2005-04-26 19:52:53
Subject: Re: What needs to be done for real Partitioning?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group