Re: speeding up COUNT and DISTINCT queries

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Max Baker <max(at)warped(dot)org>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Manfred Koizar <mkoi-pg(at)aon(dot)at>, PostgreSQL Performance Mailing List <pgsql-performance(at)postgresql(dot)org>
Subject: Re: speeding up COUNT and DISTINCT queries
Date: 2003-03-13 15:42:55
Message-ID: 874r67ff28.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Max Baker <max(at)warped(dot)org> writes:

> On Wed, Mar 12, 2003 at 05:57:50PM -0800, Joe Conway wrote:
> > Max Baker wrote:
> > >Thanks for the help. I guess i'm not clear on why there is so much
> > >extra cruft. Does postgres leave a little bit behind every time it does
> > >an update? Because this table is updated constantly.
> > >
> >
> > Yes. See:
> > http://www.us.postgresql.org/users-lounge/docs/7.3/postgres/routine-vacuuming.html
>
> That would explain why once a night isn't enough. Thanks.
> The contents of this table get refreshed every 4 hours. I'll add a
> vacuum after every refresh and comapre the results in a couple days.

If it gets completely refreshed, ie, every tuple is updated or deleted and
re-inserted in a big batch job then VACUUM might never be enough without
boosting some config values a lot. You might need to do a VACUUM FULL after
the refresh. VACUUM FULL locks the table though which might be unfortunate.

VACUUM FULL should be sufficient but you might want to consider instead
TRUNCATE-ing the table and then reinserting records rather than deleting if
that's what you're doing. Or alternatively building the new data in a new
table and then doing a switcheroo with ALTER TABLE RENAME. However ALTER TABLE
(and possible TRUNCATE as well?) will invalidate functions and other objects
that refer to the table.

Regarding the original question:

. 7.4 will probably be faster than 7.3 at least if you stick with GROUP BY.

. You could try building an index on mac, but I suspect even then it'll choose
the sequential scan. But try it with an index and enable_seqscan = off to
see if it's even worth trying to get it to use the index. If so you'll have
to lower random_page_cost and/or play with cpu_tuple_cost and other
variables to get it to do so.

. You might also want to cluster the table on that index. You would have to
recluster it every time you do your refresh and it's not clear how much it
would help if any. But it might be worth trying.

--
greg

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Treat 2003-03-13 20:05:30 Re: speeding up COUNT and DISTINCT queries
Previous Message Christopher Kings-Lynne 2003-03-13 02:20:04 Re: speeding up COUNT and DISTINCT queries