Quick Links

Re: Much Ado About COUNT(*)

From:	"Jonah H(dot) Harris" <jharris(at)tvi(dot)edu>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Much Ado About COUNT(*)
Date:	2005-01-12 19:36:15
Message-ID:	41E57C2F.1050105@tvi.edu
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-announce pgsql-hackers pgsql-patches

Greg Stark wrote:

>I think part of the problem is that there's a bunch of features related to
>these types of queries and the lines between them blur.
>
>You seem to be talking about putting visibility information inside indexes for
>so index-only plans can be performed. But you're also talking about queries
>like "select count(*) from foo" with no where clauses. Such a query wouldn't
>be helped by index-only scans.
>
>Perhaps you're thinking about caching the total number of records in a global
>piece of state like a materialized view? That would be a nice feature but I
>think it should done as a general materialized view implementation, not a
>special case solution for just this one query.
>
>Perhaps you're thinking of the min/max problem of being able to use indexes to
>pick out just the tuples satisfying the min/max constraint. That seems to me
>to be one of the more tractable problems in this area but it would still
>require lots of work.
>
>I suggest you post a specific query you find is slow. Then discuss how you
>think it ought to be executed and why.
>
>
>
You are correct, I am proposing to add visibility to the indexes.

As for unqualified counts, I believe that they could take advantage of
an index-only scan as it requires much less I/O to perform an index scan
than a sequential scan on large tables.

Min/Max would also take advantage of index only scans but say, for
example, that someone has the following:

Relation SOME_USERS
user_id BIGINT PK
user_nm varchar(32) UNIQUE INDEX
some_other_attributes...

If an application needs the user names, it would run SELECT user_nm FROM
SOME_USERS... in the current implementation this would require a
sequential scan. On a relation which contains 1M+ tuples, this requires
either a lot of I/O or a lot of cache. An index scan would immensely
speed up this query.

In response to

Re: Much Ado About COUNT(*) at 2005-01-12 19:23:44 from Greg Stark

Responses

Re: Much Ado About COUNT(*) at 2005-01-12 20:08:37 from Greg Stark
Re: Much Ado About COUNT(*) at 2005-01-12 21:42:06 from Marek Mosiewicz

Browse pgsql-announce by date

	From	Date	Subject
Next Message	Tom Lane	2005-01-12 19:41:56	Re: Much Ado About COUNT(*)
Previous Message	Reinhard Max	2005-01-12 19:34:40	Re: segfault caused by heimdal (was: SUSE port)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2005-01-12 19:41:56	Re: Much Ado About COUNT(*)
Previous Message	Reinhard Max	2005-01-12 19:34:40	Re: segfault caused by heimdal (was: SUSE port)

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Tom Lane	2005-01-12 19:41:56	Re: Much Ado About COUNT(*)
Previous Message	Reinhard Max	2005-01-12 19:34:40	Re: segfault caused by heimdal (was: SUSE port)