Re: Speedier count(*)

From: John A Meinel <john(at)arbash-meinel(dot)com>
To: Dan Harris <fbsd(at)drivefaster(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Speedier count(*)
Date: 2005-08-11 01:07:50
Message-ID: 42FAA4E6.5070703@arbash-meinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dan Harris wrote:
> I have a web page for my customers that shows them count of records and
> some min/max date ranges in each table of a database, as this is how we
> bill them for service. They can log in and check the counts at any
> time. I'd like for the counts to be as fresh as possible by keeping
> this dynamic, but I will use a periodic 'snapshot'/cron job if that is
> the only option to speed this up. I have thought about using the
> table statistics, but the estimate error is probably unacceptable
> because of the billing purposes.
>
> For some reason, the SQL Server we migrated the app from can return
> count(*) in a split second on multi-million row tables, even though it
> is a MUCH slower box hardware-wise, but it's now taking many seconds to
> run. I have read in the archives the problems MVCC brings into the
> count(*) dilemma forcing Pg to run a seq scan to get counts. Does
> SQLServer not use MVCC or have they found another approach for arriving
> at this number? Compounding all the min/max and counts from other
> tables and all those queries take about a minute to run. The tables
> will contain anywhere from 1 million to 40 million rows.

I believe SQL Server doesn't use MVCC in the same way. At the very
least, it stores some row information in the index, so it can get some
info from just an index, without having to go to the actual page (MVCC
requires a main page visit to determine visibility.)

Depending on how much it impacts performance, you can create an
INSERT/UPDATE trigger so that whenever a new entry is added, it
automatically updates a statistics table. It would be maintained as you
go, rather than periodically like a cron job.

I would go Cron if things can be slightly out of date (like 1 hour at
least), and you need updates & inserts to not be slowed down.
Otherwise I think the trigger is nicer, since it doesn't do redundant
work, and means everything stays up-to-date.

>
> Also, I am using "select ... group by ... order by .. limit 1" to get
> the min/max since I have already been bit by the issue of min() max()
> being slower.
>
>
> -Dan

John
=:->

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-08-11 03:14:40 Re: partial index regarded more expensive
Previous Message Michael Fuhr 2005-08-11 00:36:35 Re: Speedier count(*)