RES: SELECT COUNT(*) execution time on large tables (v9.0.4-1)

From: "Carlos Sotto Maior \(SIM\)" <csotto(at)sistemassim(dot)com(dot)br>
To: "'David Johnston'" <polobo(at)yahoo(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: RES: SELECT COUNT(*) execution time on large tables (v9.0.4-1)
Date: 2011-05-30 18:35:39
Message-ID: 000701cc1ef8$6a44d3c0$3ece7b40$@sistemassim.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

David, Thanks for your reply.

I will probably use the strategy of a trigger driven counter, with temporal
strategy devising current month totals and up to last month total as current
month changes rapidly.

I also apologize for not being investigative enough. I did look at wiki but
maybe I did not pursue my doubt in wiki in the proper manner. I will surely
get more acquainted to wiki to keep this channel clean from repeated
questions.

Once again, thank you very much.

Carlos Sotto Maior
+55 11 8244-7899
csotto(at)sistemassim(dot)com(dot)br

Sistemas Sim Serviços e Tecnologia Ltda.
+55 11 5041-3086
Rua Tenente Gomes Ribeiro, 78
Vila Clementino (Próximo ao Metro Santa Cruz)
São Paulo - SP
04038-040

-----Mensagem original-----
De: David Johnston [mailto:polobo(at)yahoo(dot)com]
Enviada em: sexta-feira, 27 de maio de 2011 17:49
Para: 'Carlos Sotto Maior (SIM)'; pgsql-general(at)postgresql(dot)org
Assunto: RE: [GENERAL] SELECT COUNT(*) execution time on large tables
(v9.0.4-1)

Counting live data is inherently imprecise. There are supposedly some
system tables that can give you rough numbers.

You would be better off figuring out an alternative method to get the data
you desire and stop continually recounting all 5.7M records.

A Trigger driven counter, for insert and delete, is probably the most
obvious method. Also, say for temporal data, cache the prior monthly counts
and only perform an actual count over the current (changing) month(s).

At your table size the brute-force approach is obviously not going to work
so an alternative method needs to be devised, one that eliminates
re-counting previously counted records. The specific design is going to be
highly dependent on your specific requirements - which is why no generalized
solution exists. If you provide the why behind the question, and not just
the question, people may be inclined to provide relevant suggestions.

Issuing a "count(*)" is not a need - it is an implementation. The need is
what you end up doing with that number.

Lastly, the time you spent combing the system catalogs would have been
better spent perusing the FAQ linked to off the PostgreSQL homepage. You
question, in almost the same words, is in the FAQ with a link to the wiki
which repeats all your observations and explains why the behavior is that
way; and suggests (links to) possible alternatives. You may wish to go
there now to get more background and ideas.

David J.

> Hi,
>
> My application has a frequent need to issue a select count(*) on tables.
> Some have a large row count. (The example below are from a 5.7 M row;
> Some are larger).
>
> Issuing either SELECT COUNT(*) or SELECT COUNT(<Primary_Key_Colum>)
> yelds a sequential scan on table;
>
> I have browsed catalog tables, digging for a real time Row.count but
> so
far
> did not find any.
> QUESTION: Is there a better (faster) way to obtain the row count from
> a table?
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2011-05-30 21:09:45 Function Column Expansion Causes Inserts To Fail
Previous Message Andrej Podzimek 2011-05-30 16:52:47 Re: Regular disk activity of an idle DBMS