Re: Yet Another COUNT(*)...WHERE...question

From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Rainer Bauer <usenet(at)munnin(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Yet Another COUNT(*)...WHERE...question
Date: 2007-08-16 15:35:57
Message-ID: 20070816113557.80e4af1b.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In response to Rainer Bauer <usenet(at)munnin(dot)com>:

> "Trevor Talbot" wrote:
>
> >On 8/16/07, Rainer Bauer <usenet(at)munnin(dot)com> wrote:
> >
> >> >> But if you go to eBay, they always give you an accurate count. Even if the no.
> >> >> of items found is pretty large (example: <http://search.ebay.com/new>).
> >> >
> >> >And I'd bet money that they're using a full text search of some kind to
> >> >get those results, which isn't remotely close to the same thing as a
> >> >generic SELECT count(*).
> >>
> >> Without text search (but with a category restriction):
> >> <http://collectibles.listings.ebay.com/_W0QQsacatZ1QQsocmdZListingItemList>
> >>
> >> I only wanted to show a counter-example for a big site which uses pagination
> >> to display result sets and still reports accurate counts.
> >
> >Categories are still finite state: you can simply store a count for
> >each category. Again it's just a case of knowing your data and
> >queries; it's not trying to solve a general infinite-possibilities
> >situation.
>
> Consider this query with multiple WHERE conditions:
> <http://search.ebay.com/ne-ol-an_W0QQfasiZ1QQfbdZ1QQfcdZ1QQfcidZ77QQfclZ3QQfmcZ1QQfrppZ50QQfsooZ1QQfsopZ1QQftidZ1QQpriceZ1QQsabdhiZ100QQsacurZ999QQsalicZQ2d15QQsaprchiZ50000QQsatitleZQ28neQ2aQ2colQ2aQ2canQ2aQ29QQsojsZ0>
>
> My point is that whatever search criterias are involved and how many items are found eBay always returns the *accurate* number of items found.

While I don't _want_ to argue with you ... I can't seem to help myself.

How do you _know_ that's the exact number of items? There are 50 items on
that page, the paginator at the bottom shows 97,686 pages, but there's no
way (that I can find) to go to the _last_ page to ensure that said numbers
are correct. It could simply be estimating the number of items and
calculating the # of pages based on that. With 4mil items, a few 1000 off
isn't anything anyone would notice.

> Before this drifts off:
> * I do know *why* count(*) is slow using Postgres.
> * I *think* that count(*) is fast on eBay because count is cheaper using Oracle (which eBay does: <http://www.sun.com/customers/index.xml?c=ebay.xml>).

That could be possible, but it's still speculation at this point.
If someone with Oracle-fu could say for sure one way or the other,
that would be interesting ...

Unless there's data on that sun.com page that provides more detail. It
doesn't seem to be willing to load for me at this point ...

> * I realize that pagination for multi-million tuple results does not make sense.

Then what is the point to this thread? Are we just shooting the breeze at
this point?

--
Bill Moran
http://www.potentialtech.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2007-08-16 15:36:12 Re: pqlib in c++: PQconnectStart PQconnectPoll
Previous Message Phoenix Kiula 2007-08-16 15:35:05 Re: Yet Another COUNT(*)...WHERE...question