Re: Performance of count(*)

From: Andreas Kostyrka <andreas(at)kostyrka(dot)org>
To: Andreas Tille <tillea(at)rki(dot)de>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance of count(*)
Date: 2007-03-22 12:29:46
Message-ID: 20070322122944.GD4439@andi-lap.la.revver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

* Andreas Tille <tillea(at)rki(dot)de> [070322 13:24]:
> On Thu, 22 Mar 2007, Andreas Kostyrka wrote:
>
> >Which version of PG?
>
> Ahh, sorry, forgot that. The issue occurs in Debian (Etch) packaged
> version 7.4.16. I plan to switch soon to 8.1.8.
I'd recommend 8.2 if at all possible :)
>
> >That's the reason why PG (check the newest releases, I seem to
> >remember that there has been some aggregate optimizations there),
>
> I'll verify this once I moved to the new version.
8.1 won't help you I'd guess. ;)

>
> >does
> >a SeqScan for select count(*) from table. btw, depending upon your
> >data, doing a select count(*) from table where user=X will use an
> >Index, but will still need to fetch the rows proper to validate them.
>
> I have an index on three (out of 7 columns) of this table. Is there
> any chance to optimize indexing regarding this.
Well, that depends upon you query pattern. It's an art and a science
at the same time ;)
>
> Well, to be honest I'm not really interested in the performance of
> count(*). I was just discussing general performance issues on the
> phone line and when my colleague asked me about the size of the
> database he just wonderd why this takes so long for a job his
> MS-SQL server is much faster. So in principle I was just asking
> a first question that is easy to ask. Perhaps I come up with
> more difficult optimisation questions.

Simple. MSSQL is optimized for this case, and uses "older"
datastructures. PG uses a MVCC storage, which is not optimized for
this usecase. It's quite fast for different kinds of queries.

The basic trouble here is that mvcc makes it a little harder to decide
what is valid for your transaction, plus the indexes seems to be
designed for lookup, not for data fetching. (Basically, PG can use
indexes only to locate potential data, but cannot return data directly
out of an index)

Andreas

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Bill Moran 2007-03-22 12:31:30 Re: Performance of count(*)
Previous Message David Brain 2007-03-22 12:08:25 Potential memory usage issue