Re: Improving count(*)

From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-18 00:51:23
Message-ID: Pine.LNX.4.58.0511181149070.9614@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 18 Nov 2005, Simon Riggs wrote:

> >From here, another proposal. We have a GUC called count_uses_estimate
> that is set to off by default. If set to true, then a count(*) will use
> the planner logic to estimate number of rows in the table and return
> that as the answer, rather than actually count the row. Unless analyze
> statistics are not available, in which case it does the real count.

I'm finishing off a tablesample patch a grad student on #postgresql was
working on.

template1=# select count(*)*100 from a tablesample system(1) repeatable
(2);
?column?
----------
8371100
(1 row)

Time: 6366.757 ms
template1=# select count(*)*50 from a tablesample system(2) repeatable
(11);
?column?
----------
8453550
(1 row)

Time: 10521.871 ms
template1=# select count(*)*10 from a tablesample system(10) repeatable
(3);
?column?
----------
8314350
(1 row)

Time: 28744.498 ms
template1=# select count(*) from a;
count
---------
8388608
(1 row)

Time: 33897.857 ms

Seems like a better solution. I can finish the patch pretty soon. I need
to contact the original author, who has disappeared, but I'll send it over
to you.

Gavin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2005-11-18 00:57:38 Re: CLUSTER and clustered indices
Previous Message Joe Conway 2005-11-18 00:49:31 Re: Some array semantics issues