Re: Hardware/OS recommendations for large databases (

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: Luke Lonergan <llonergan(at)greenplum(dot)com>, stange(at)rentec(dot)com, Greg Stark <gsstark(at)mit(dot)edu>, Dave Cramer <pg(at)fastcrypt(dot)com>, Joshua Marsh <icub3d(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hardware/OS recommendations for large databases (
Date: 2005-11-24 16:00:25
Message-ID: 87d5kqkpd2.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> writes:

> Yeah - it's pretty clear that the count aggregate is fairly expensive wrt cpu -
> However, I am not sure if all agg nodes suffer this way (guess we could try a
> trivial aggregate that does nothing for all tuples bar the last and just
> reports the final value it sees).

As you mention count(*) and count(1) are the same thing.

Last I heard the reason count(*) was so expensive was because its state
variable was a bigint. That means it doesn't fit in a Datum and has to be
alloced and stored as a pointer. And because of the Aggregate API that means
it has to be allocated and freed for every tuple processed.

There was some talk of having a special case API for count(*) and maybe
sum(...) to avoid having to do this.

There was also some talk of making Datum 8 bytes wide on platforms where that
was natural (I guess AMD64, Sparc64, Alpha, Itanic).

Afaik none of these items have happened but I don't know for sure.

--
greg

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-11-24 16:25:28 Re: Hardware/OS recommendations for large databases (
Previous Message Tom Lane 2005-11-24 15:37:21 Re: xlog flush request error