Re: Strange count(*) implementation?

From: Henk Ernst Blok <h(dot)e(dot)blok(at)utwente(dot)nl>
To: Tino Wildenhain <tino(at)wildenhain(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Strange count(*) implementation?
Date: 2004-10-26 11:56:41
Message-ID: 417E3B79.8080902@utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

My question was more of a fundamental nature as this count by scan
seemed to contradict the theory about how to optimize it.

I assume(d) the more expensive statistics (e.g., value distribution
info) are updated only when outdated too much or on request (manual
vacuum). Usually, other/cheap statistics can easily be maintained
incrementally and thus reflect actual table state after each update. Of
course, the MVCC principle seems to make things a bit more complicated I
understand now. But tracking whether statistics are dirty has to be in
the system anyway. How does it otherwise decide when to do its own
statistics updates? So if explain can get the most recent count, why
not use it in the count as well if you know the statistics are still
acurate?

By the way, a count(*) without any where does occur very frequently if
you are dealing with an OLAP load, which is the case in my setting. So,
I indeed already 'fixed' the performance problem by precomputing all
counts I can predict to be of any use.

Anyway, I understood this issue has been subject to discusion before I
was on the list (searching the archive/website was/is not very
effective, so I didn't know until someone told me so, sorry). So, I
leave it to the developers what to do with this topic.

Regards,

Henk Ernst

Tino Wildenhain wrote:

>hi,
>
>On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote:
>
>
>>Hi Posgres users/developers,
>>
>>Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
>>table scan to compute a count(*) on a base table after a vacuum analyze
>>has been done with no following updates that might have outdated any
>>statistics. Strangly the explain command does give the correct number of
>>tuples instantaniously from the catalog, as one would expect. Still the
>>optimizer thinks it needs a full table scan to do count.
>>
>>
>>
>...
>
>
>>The consequence of this seemingly odd count implementation is a very
>>very slow count.
>>
>>
>
>How should the query planner know the vacuum was recent enough and there
>were no modifications to the table since?
>
>If you are interested in rough numbers you could read the system tables
>for the last vacuum statistics. If you need fast count and can spend
>some cycles on inserts, just make a buffer table with count results
>after insert.
>
>Unqualified count e.g. without a WHERE clause should not need to
>be used a lot.
>
>Regards
>Tino
>
>

--
address: DB group, Computer Science, EEMCS Dept., University of Twente,
PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
phone: ++31 (0)53 489 3754 (if no response: 3690)
email: h(dot)e(dot)blok(at)utwente(dot)nl
WWW: http://www.cs.utwente.nl/~blokh

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Neil Conway 2004-10-26 12:18:54 Re: Strange count(*) implementation?
Previous Message Ian Barwick 2004-10-26 11:30:49 Re: compatibilityissues from 7.1 to 7.4