Re: count(*) and bad design was: Experiences with extensibility

From: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: count(*) and bad design was: Experiences with extensibility
Date: 2008-01-09 11:04:09
Message-ID: 20080109120409.38c49072@webthatworks.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 9 Jan 2008 01:39:34 -0800
"Dann Corbit" <DCorbit(at)connx(dot)com> wrote:

> > On Wed, 09 Jan 2008 00:06:45 -0800
> > "Joshua D. Drake" <jd(at)commandprompt(dot)com> wrote:

> > > Granted there are scenarios where others are FASTER (SELECT
> > > COUNT(*)) but I find that if you are doing those items, you
> > > normally have a weird design anyway.

> > > Sincerely,

> > Sincerely, would you make an example of such a bad design?

> A program that estimates cardinality by doing SELECT COUNT(*) is a
> bad design. Assuming you have the wherewithal to vacuum your
> tables (or have autovacuum enabled) a query against the system
> tables will be a much better estimate of cardinality.
>
> Now (some may argue) what if we want an _EXACT_ value for
> COUNT(*)? We had better ask ourselves (in that circumstance) "Am I
> willing to lock the entire table and scan it?" because that is what
> will be necessary to get a truly exact value. Otherwise, you can
> get totals that are wildly off-base if someone is doing a bulk
> import or deleting a large number of records.

Please forgive my naiveness in this field but what does it mean an
"exact count" and what other DB means with "an exact count" and how
other DB deal with it?

How "count" is defined in the SQL standard?

Is there a real situation then where you really need the "exact"
count?

Am I right saying that:

select count(*) from ...
-- here count may already be different

and that:
select for update count(*)

could be very expensive? Or what would it mean to do a
select for update count(*) ...?

> I think this should be a FAQ because it is a (F)requently (A)sked
> (Q)uestion.

After reading your email I think the real problem is not how to do
otherwise but understand what count(*) really mean and when and if it
is really useful and when it can be avoided.

I'd write in the FAQ something in the line of:

- What count(*) really does?
- When it can be avoided?
- When it can't be avoided?

In my everyday use of count(*), after your email I can hardly spot a
place where I need an exact count.
But to better understand and being convinced that using count(*) is
bad design I think last question could help a lot.

How does count(*) with a where clause perform generally on postgresql
compared to other DB?
I'd expect it perform as good or better than other DB since now the
bottleneck should be how efficiently it can filter records... but
still a count(*) with a where clause will incur in the same problem
of what "exact" means.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Willy-Bas Loos 2008-01-09 11:17:15 performance differences of major versions
Previous Message Robin-Vossen 2008-01-09 11:02:53 After Installing a Program I get this error: psql:sql/Pg-database.sql:825: ERROR: language "plpgsql" does not exist