Re: Strange count(*) implementation?

From: Tino Wildenhain <tino(at)wildenhain(dot)de>
To: Henk Ernst Blok <h(dot)e(dot)blok(at)utwente(dot)nl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Strange count(*) implementation?
Date: 2004-10-26 12:45:54
Message-ID: 1098794754.21062.422.camel@sabrina.peacock.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 2004-10-26 at 13:56, Henk Ernst Blok wrote:
> Hi,
>
> My question was more of a fundamental nature as this count by scan
> seemed to contradict the theory about how to optimize it.

It is hard or next to impossible to optimize count() or more generally
aggregates in a MVCC environment. Note also all the cases where
you have a qualified select to calculate the aggregate.

> I assume(d) the more expensive statistics (e.g., value distribution
> info) are updated only when outdated too much or on request (manual
> vacuum). Usually, other/cheap statistics can easily be maintained
> incrementally and thus reflect actual table state after each update.

I remember some discussion about this. But also here the MVCC and
the general call for performance leads to the current solution
where statistics are only updated on vacuum. With PG8.0 you have
vacuum strategies with better performance which can run more often
as I understand - so while not giving you exact figures your
count() could be estimated at least.

> Of course, the MVCC principle seems to make things a bit more
> complicated I understand now. But tracking whether statistics are
> dirty has to be in the system anyway. How does it otherwise decide
> when to do its own statistics updates? So if explain can get the most
> recent count, why not use it in the count as well if you know the
> statistics are still acurate?

The point is: you dont know it. There is curently no: mark statistics
dirty if table has new tuples or tuples removed.

> By the way, a count(*) without any where does occur very frequently if
> you are dealing with an OLAP load, which is the case in my setting.
> So, I indeed already 'fixed' the performance problem by precomputing
> all counts I can predict to be of any use.

I'm not familar with OLAP specifics, so what is the meaning of the
count() here? What is done with this information?

> Anyway, I understood this issue has been subject to discusion before I
> was on the list (searching the archive/website was/is not very
> effective, so I didn't know until someone told me so, sorry). So, I
> leave it to the developers what to do with this topic.

Yes, very very often I can tell you :-)
Really this is an FAQ :-)

Regards
Tino

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mike Mascari 2004-10-26 12:55:53 Any plans on allowing user-defined triggers to be deferrable?
Previous Message Alvaro Herrera 2004-10-26 12:28:44 Re: Strange count(*) implementation?