Skip site navigation (1) Skip section navigation (2)

Re: count(*) slow on large tables

From: Hilary Forbes <hforbes(at)dmr(dot)co(dot)uk>
To: "Gregory S(dot) Williamson" <gsw(at)globexplorer(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) slow on large tables
Date: 2003-10-03 16:50:17
Message-ID: 5.1.0.14.0.20031003174352.02172a08@mailserver.ipl.co.uk (view raw or flat)
Thread:
Lists: pgsql-performance
We frequently need to know the number of tuples in a table although sometimes we do have WHERE status='X' for example but this often doesn't guarantee an indexed scan. And yes, my reasons are the same - reporting figures eg number of bookings made since the system was introduced.   Have you tried doing

SELECT count(pkey)

rather than count(*)

where pkey is the primary key (assuming you have a single field that is a primary key or a unique indexed key).  This is MUCH faster in my experience.  If you don't have such an animal, I'd seriously suggesting putting in a serial number and recreate the table with that as the primary key.

The vacuuming bit is not accurate enough for us in many instances.  Also a count can be easily fed into other programs/web pages etc without having to parse the vacuum output.

Hilary

At 23:22 02/10/2003 -0700, you wrote:

>I can tell you that this is one of the first thing applications' programmers and IT managers notice. It can slightly tarnish postgres' image when it takes it many long seconds to do what other databases can do in a snap. The "whys and wherefores" can be hard to get across once they see the comparative numbers.
>
>When I use Informix "dbaccess" it has a "status" which will tell me the row count of a table virtually instantly -- it can be locked out by a user with an exclusive lock so its not entirely independant of the table (like a stored value in one of the system catalog tables).
>
>This is not to say Informix is "right" and Postgres is "wrong" ... but it  is something that virtually any newcomer will run into head long, with resulting bruises and contusions, not to mention confusion.
>
>At the very least this needs to be VERY clearly explained right up front, along with some of the possible work-arounds, depending on what one is really after with this info.
>
>Greg Williamson
>DBA
>GlobeXplorer LLC
>
>-----Original Message-----
>From:   Dror Matalon [mailto:dror(at)zapatec(dot)com]
>Sent:   Thu 10/2/2003 9:27 PM
>To:     pgsql-performance(at)postgresql(dot)org
>Cc:     
>Subject:        Re: [PERFORM] count(*) slow on large tables
>
>
>I smell a religious war in the aii:-). 
>Can you go several days in a row without doing select count(*) on any
>of your tables? 
>
>I suspect that this is somewhat a domain specific issue. In some areas
>you don't need to know the total number of rows in your tables, in
>others you do. 
>
>I also suspect that you're right, that end user applications don't use
>this information as often as DBAs would. On the other hand, it seems
>whenever you want to optimize your app (something relevant to this list),
>one of the things you do need to know is the number of rows in your
>table.
>
>Dror
>
>On Thu, Oct 02, 2003 at 10:08:18PM -0400, Christopher Browne wrote:
>> The world rejoiced as dror(at)zapatec(dot)com (Dror Matalon) wrote:
>> > I don't have an opinion on how hard it would be to implement the
>> > tracking in the indexes, but "select count(*) from some table" is, in my
>> > experience, a query that people tend to run quite often. 
>> > One of the databases that I've used, I believe it was Informix, had that
>> > info cached so that it always new how many rows there were in any
>> > table. It was quite useful.
>> 
>> I can't imagine why the raw number of tuples in a relation would be
>> expected to necessarily be terribly useful.
>> 
>> I'm involved with managing Internet domains, and it's only when people
>> are being pretty clueless that anyone imagines that "select count(*)
>> from domains;" would be of any use to anyone.  There are enough "test
>> domains" and "inactive domains" and other such things that the raw
>> number of "things in the table" aren't really of much use.
>> 
>> - I _do_ care how many pages a table occupies, to some extent, as that
>> determines whether it will fit in my disk space or not, but that's not
>> COUNT(*).
>> 
>> - I might care about auditing the exact numbers of records in order to
>> be assured that a data conversion process was done correctly.  But in
>> that case, I want to do something a whole *lot* more detailed than
>> mere COUNT(*).
>> 
>> I'm playing "devil's advocate" here, to some extent.  But
>> realistically, there is good reason to be skeptical of the merits of
>> using SELECT COUNT(*) FROM TABLE for much of anything.
>> 
>> Furthermore, the relation that you query mightn't be a physical
>> "table."  It might be a more virtual VIEW, and if that's the case,
>> bets are even MORE off.  If you go with the common dictum of "good
>> design" that users don't directly access tables, but go through VIEWs,
>> users may have no way to get at SELECT COUNT(*) FROM TABLE.
>> -- 
>> output = reverse("ac.notelrac.teneerf" "@" "454aa")
>> http://www.ntlug.org/~cbbrowne/finances.html
>> Rules  of  the  Evil  Overlord  #74.   "When  I  create  a  multimedia
>> presentation of my plan designed  so that my five-year-old advisor can
>> easily  understand the  details, I  will not  label the  disk "Project
>> Overlord" and leave it lying on top of my desk."
>> <http://www.eviloverlord.com/>
>> 
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Don't 'kill -9' the postmaster
>
>-- 
>Dror Matalon, President
>Zapatec Inc 
>1700 MLK Way
>Berkeley, CA 94709
>http://www.zapatec.com
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster 


Hilary Forbes
-------------
DMR Computer Limited:   http://www.dmr.co.uk/
Direct line:  01689 889950
Switchboard:  (44) 1689 860000  Fax: (44) 1689 860330
E-mail:  hforbes(at)dmr(dot)co(dot)uk

**********************************************************


pgsql-performance by date

Next:From: Oleg LebedevDate: 2003-10-03 16:54:42
Subject: Re: TPC-R benchmarks
Previous:From: Tom LaneDate: 2003-10-03 16:32:00
Subject: Re: Postgres low end processing.

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group