Re: Performance of count(*)

From: Bill Moran <wmoran(at)collaborativefusion(dot)com>
To: ismo(dot)tuononen(at)solenovo(dot)fi
Cc: Albert Cervera Areny <albert(at)sedifa(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance of count(*)
Date: 2007-03-22 12:31:30
Message-ID: 20070322083130.1de9dc5b.wmoran@collaborativefusion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

In response to ismo(dot)tuononen(at)solenovo(dot)fi:
>
> approximated count?????
>
> why? who would need it? where you can use it?
>
> calculating costs and desiding how to execute query needs
> approximated count, but it's totally worthless information for any user
> IMO.

I don't think so.

We have some AJAX stuff where users enter search criteria on a web form,
and the # of results updates in "real time" as they change their criteria.

Right now, this works fine with small tables using count(*) -- it's fast
enough not to be an issue, but we're aware that we can't use it on large
tables.

An estimate_count(*) or similar that would allow us to put an estimate of
how many results will be returned (not guaranteed accurate) would be very
nice to have in these cases.

We're dealing with complex sets of criteria. It's very useful for the users
to know in "real time" how much their search criteria is effecting the
result pool. Once they feel they've limited as much as they can without
reducing the pool too much, they can hit submit and get the actual result.

As I said, we do this with small data sets, but it's not terribly useful
there. Where it will be useful is searches of large data sets, where
constantly submitting and then retrying is overly time-consuming.

Of course, this is count(*)ing the results of a complex query, possibly
with a bunch of joins and many limitations in the WHERE clause, so I'm
not sure what could be done overall to improve the response time.

> On Thu, 22 Mar 2007, Albert Cervera Areny wrote:
>
> > As you can see, PostgreSQL needs to do a sequencial scan to count because its
> > MVCC nature and indices don't have transaction information. It's a known
> > drawback inherent to the way PostgreSQL works and which gives very good
> > results in other areas. It's been talked about adding some kind of
> > approximated count which wouldn't need a full table scan but I don't think
> > there's anything there right now.
> >
> > A Dijous 22 Març 2007 11:53, Andreas Tille va escriure:
> > > Hi,
> > >
> > > I just try to find out why a simple count(*) might last that long.
> > > At first I tried explain, which rather quickly knows how many rows
> > > to check, but the final count is two orders of magnitude slower.
> > >
> > > My MS_SQL server using colleague can't believe that.
> > >
> > > $ psql InfluenzaWeb -c 'explain SELECT count(*) from agiraw ;'
> > > QUERY PLAN
> > > -----------------------------------------------------------------------
> > > Aggregate (cost=196969.77..196969.77 rows=1 width=0)
> > > -> Seq Scan on agiraw (cost=0.00..185197.41 rows=4708941 width=0)
> > > (2 rows)
> > >
> > > real 0m0.066s
> > > user 0m0.024s
> > > sys 0m0.008s
> > >
> > > $ psql InfluenzaWeb -c 'SELECT count(*) from agiraw ;'
> > > count
> > > ---------
> > > 4708941
> > > (1 row)
> > >
> > > real 0m4.474s
> > > user 0m0.036s
> > > sys 0m0.004s
> > >
> > >
> > > Any explanation?
> > >
> > > Kind regards
> > >
> > > Andreas.
> >
> > --
> > Albert Cervera Areny
> > Dept. Informàtica Sedifa, S.L.
> >
> > Av. Can Bordoll, 149
> > 08202 - Sabadell (Barcelona)
> > Tel. 93 715 51 11
> > Fax. 93 715 51 12
> >
> > ====================================================================
> > ........................ AVISO LEGAL ............................
> > La presente comunicación y sus anexos tiene como destinatario la
> > persona a la que va dirigida, por lo que si usted lo recibe
> > por error debe notificarlo al remitente y eliminarlo de su
> > sistema, no pudiendo utilizarlo, total o parcialmente, para
> > ningún fin. Su contenido puede tener información confidencial o
> > protegida legalmente y únicamente expresa la opinión del
> > remitente. El uso del correo electrónico vía Internet no
> > permite asegurar ni la confidencialidad de los mensajes
> > ni su correcta recepción. En el caso de que el
> > destinatario no consintiera la utilización del correo electrónico,
> > deberá ponerlo en nuestro conocimiento inmediatamente.
> > ====================================================================
> > ........................... DISCLAIMER .............................
> > This message and its attachments are intended exclusively for the
> > named addressee. If you receive this message in error, please
> > immediately delete it from your system and notify the sender. You
> > may not use this message or any part of it for any purpose.
> > The message may contain information that is confidential or
> > protected by law, and any opinions expressed are those of the
> > individual sender. Internet e-mail guarantees neither the
> > confidentiality nor the proper receipt of the message sent.
> > If the addressee of this message does not consent to the use
> > of internet e-mail, please inform us inmmediately.
> > ====================================================================
> >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 7: You can help support the PostgreSQL project by donating at
> >
> > http://www.postgresql.org/about/donate
> >
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate
>
>
>
>
>
>

--
Bill Moran
Collaborative Fusion Inc.

wmoran(at)collaborativefusion(dot)com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Bill Moran 2007-03-22 12:50:37 Re: Potential memory usage issue
Previous Message Andreas Kostyrka 2007-03-22 12:29:46 Re: Performance of count(*)