Skip site navigation (1) Skip section navigation (2)

Re: speeding up COUNT and DISTINCT queries

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Max Baker <max(at)warped(dot)org>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Manfred Koizar <mkoi-pg(at)aon(dot)at>,PostgreSQL Performance Mailing List <pgsql-performance(at)postgresql(dot)org>
Subject: Re: speeding up COUNT and DISTINCT queries
Date: 2003-03-13 15:42:55
Message-ID: 874r67ff28.fsf@stark.dyndns.tv (view raw or flat)
Thread:
Lists: pgsql-performance
Max Baker <max(at)warped(dot)org> writes:

> On Wed, Mar 12, 2003 at 05:57:50PM -0800, Joe Conway wrote:
> > Max Baker wrote:
> > >Thanks for the help.  I guess i'm not clear on why there is so much
> > >extra cruft.  Does postgres leave a little bit behind every time it does
> > >an update?  Because this table is updated constantly.
> > >
> > 
> > Yes. See:
> > http://www.us.postgresql.org/users-lounge/docs/7.3/postgres/routine-vacuuming.html
> 
> That would explain why once a night isn't enough.  Thanks. 
> The contents of this table get refreshed every 4 hours.  I'll add a
> vacuum after every refresh and comapre the results in a couple days.

If it gets completely refreshed, ie, every tuple is updated or deleted and
re-inserted in a big batch job then VACUUM might never be enough without
boosting some config values a lot. You might need to do a VACUUM FULL after
the refresh. VACUUM FULL locks the table though which might be unfortunate.

VACUUM FULL should be sufficient but you might want to consider instead
TRUNCATE-ing the table and then reinserting records rather than deleting if
that's what you're doing. Or alternatively building the new data in a new
table and then doing a switcheroo with ALTER TABLE RENAME. However ALTER TABLE
(and possible TRUNCATE as well?) will invalidate functions and other objects
that refer to the table.

Regarding the original question:

. 7.4 will probably be faster than 7.3 at least if you stick with GROUP BY.

. You could try building an index on mac, but I suspect even then it'll choose
  the sequential scan. But try it with an index and enable_seqscan = off to
  see if it's even worth trying to get it to use the index. If so you'll have
  to lower random_page_cost and/or play with cpu_tuple_cost and other
  variables to get it to do so.

. You might also want to cluster the table on that index. You would have to
  recluster it every time you do your refresh and it's not clear how much it
  would help if any. But it might be worth trying.

--
greg


In response to

Responses

pgsql-performance by date

Next:From: Robert TreatDate: 2003-03-13 20:05:30
Subject: Re: speeding up COUNT and DISTINCT queries
Previous:From: Christopher Kings-LynneDate: 2003-03-13 02:20:04
Subject: Re: speeding up COUNT and DISTINCT queries

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group