Re: COUNT & Pagination

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Rajesh Kumar Mallah <mallah(at)trade-india(dot)com>
Cc: David Shadovitz <david(at)www(dot)shadovitz(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: COUNT & Pagination
Date: 2004-01-14 21:40:01
Message-ID: Pine.LNX.4.33.0401141436230.25071-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 14 Jan 2004, Rajesh Kumar Mallah wrote:

> scott.marlowe wrote:
>
> >On Tue, 13 Jan 2004, David Shadovitz wrote:
> >
> >
> >
> >>>We avert the subsequent execution of count(*) by passing the
> >>>value of count(*) as a query parameter through the link in page
> >>>numbers.
> >>>
> >>>
> >>Mallah, and others who mentioned caching the record count:
> >>
> >>Yes, I will certainly do this. I can detect whether the query's filter has
> >>been changed, or whether the user is merely paging through the results or
> >>sorting* the results.
> >>
> >>I'd love to completely eliminate the cost of the COUNT(*) query, but I guess
> >>that I cannot have everything.
> >>
> >>* My HTML table column headers are hyperlinks which re-execute the query,
> >>sorting the results by the selected column. The first click does an ASC
> >>sort; a second click does a DESC sort.
> >>
> >>
> >
> >another useful trick is to have your script save out the count(*) result
> >in a single row table with a timestamp, and every time you grab if, check
> >to see if x number of minutes have passed, and if so, update that row with
> >a count(*).
> >
>
> Greetings!
>
> The count(*) can get evaluated with any arbitrary combination
> in whre clause how do you plan to store that information ?
>
> In a typical application pagination could be required in n number
> of contexts . I would be interested to know more about this trick
> and its applicability in such situations.
>
> Offtopic:
>
> Does PostgreSQL optimise repeated execution of similar queries ie
> queries on same table or set of tables (in a join) with same where clause
> and only differing in LIMIT and OFFSET.

Yes, and no.

Yes, previously run query should be faster, if it fits in kernel cache.

No, Postgresql doesn't cache any previous results or plans (unless you use
prepare / execute, then it only caches the plan, not the query results).

Plus, the design of Postgresql is such that it would have to do a LOT of
cache checking to see if there were any updates to the underlying data
between selects. Since such queries are unlikely to be repeated inside a
transaction, the only place where you wouldn't have to check for new
tuples, it's not really worth trying to implement.

Keep in mind most databases can use an index on max(*) because each
aggregate is programmed by hand to do one thing. In Postgresql, you can
create your own aggregate, and since there's no simple way to make
aggregates use indexes in the general sense, it's not likely to get
optimized. I.e. any optimization for JUST max(*)/min(*) is unlikely
unless it can be used for the other aggregates.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ceri Storey 2004-01-14 23:10:15 Join optimisation Quandry
Previous Message scott.marlowe 2004-01-14 21:35:52 Re: 100 simultaneous connections, critical limit?