Skip site navigation (1) Skip section navigation (2)

Re: SELECT COUNT(*) execution time on large tables (v9.0.4-1)

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Carlos Sotto Maior \(SIM\)'" <csotto(at)sistemassim(dot)com(dot)br>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: SELECT COUNT(*) execution time on large tables (v9.0.4-1)
Date: 2011-05-27 20:48:42
Message-ID: 00b001cc1caf$77062e10$65128a30$@yahoo.com (view raw or flat)
Thread:
Lists: pgsql-general
Counting live data is inherently imprecise.  There are supposedly some
system tables that can give you rough numbers.

You would be better off figuring out an alternative method to get the data
you desire and stop continually recounting all 5.7M records. 

A Trigger driven counter, for insert and delete, is probably the most
obvious method.  Also, say for temporal data, cache the prior monthly counts
and only perform an actual count over the current (changing) month(s).

At your table size the brute-force approach is obviously not going to work
so an alternative method needs to be devised, one that eliminates
re-counting previously counted records.  The specific design is going to be
highly dependent on your specific requirements - which is why no generalized
solution exists.  If you provide the why behind the question, and not just
the question, people may be inclined to provide relevant suggestions. 

Issuing a "count(*)" is not a need - it is an implementation.  The need is
what you end up doing with that number.

Lastly, the time you spent combing the system catalogs would have been
better spent perusing the FAQ linked to off the PostgreSQL homepage.  You
question, in almost the same words, is in the FAQ with a link to the wiki
which repeats all your observations and explains why the behavior is that
way; and suggests (links to) possible alternatives.  You may wish to go
there now to get more background and ideas.

David J.


> Hi,
> 
> My application has a frequent need to issue a select count(*) on tables.
> Some have a large row count. (The example below are from a 5.7 M row;
> Some are larger).
> 
> Issuing  either  SELECT COUNT(*)  or SELECT COUNT(<Primary_Key_Colum>)
> yelds a sequential scan on table;
> 
> I have browsed catalog tables, digging for a real time Row.count but  so
far
> did not find any.
> QUESTION: Is there a better (faster) way to obtain the row count from a
> table?
> 



In response to

Responses

pgsql-general by date

Next:From: Derrick RiceDate: 2011-05-27 21:11:56
Subject: Re: Shared Buffer Size
Previous:From: preetika tyagiDate: 2011-05-27 19:36:35
Subject: Shared Buffer Size

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group