Re: SELECT COUNT(*) execution time on large tables (v9.0.4-1)

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Carlos Sotto Maior \(SIM\)'" <csotto(at)sistemassim(dot)com(dot)br>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: SELECT COUNT(*) execution time on large tables (v9.0.4-1)
Date: 2011-05-27 20:48:42
Message-ID: 00b001cc1caf$77062e10$65128a30$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Counting live data is inherently imprecise. There are supposedly some
system tables that can give you rough numbers.

You would be better off figuring out an alternative method to get the data
you desire and stop continually recounting all 5.7M records.

A Trigger driven counter, for insert and delete, is probably the most
obvious method. Also, say for temporal data, cache the prior monthly counts
and only perform an actual count over the current (changing) month(s).

At your table size the brute-force approach is obviously not going to work
so an alternative method needs to be devised, one that eliminates
re-counting previously counted records. The specific design is going to be
highly dependent on your specific requirements - which is why no generalized
solution exists. If you provide the why behind the question, and not just
the question, people may be inclined to provide relevant suggestions.

Issuing a "count(*)" is not a need - it is an implementation. The need is
what you end up doing with that number.

Lastly, the time you spent combing the system catalogs would have been
better spent perusing the FAQ linked to off the PostgreSQL homepage. You
question, in almost the same words, is in the FAQ with a link to the wiki
which repeats all your observations and explains why the behavior is that
way; and suggests (links to) possible alternatives. You may wish to go
there now to get more background and ideas.

David J.

> Hi,
>
> My application has a frequent need to issue a select count(*) on tables.
> Some have a large row count. (The example below are from a 5.7 M row;
> Some are larger).
>
> Issuing either SELECT COUNT(*) or SELECT COUNT(<Primary_Key_Colum>)
> yelds a sequential scan on table;
>
> I have browsed catalog tables, digging for a real time Row.count but so
far
> did not find any.
> QUESTION: Is there a better (faster) way to obtain the row count from a
> table?
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Derrick Rice 2011-05-27 21:11:56 Re: Shared Buffer Size
Previous Message preetika tyagi 2011-05-27 19:36:35 Shared Buffer Size