Re: Much Ado About COUNT(*)

From: "Marek Mosiewicz" <marekmosiewicz(at)poczta(dot)onet(dot)pl>
To: "Jonah H(dot) Harris" <jharris(at)tvi(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Much Ado About COUNT(*)
Date: 2005-01-12 21:42:06
Message-ID: MKEOKKDMKPDNCLDGMHPDOEJECFAA.marekmosiewicz@poczta.onet.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-announce pgsql-hackers pgsql-patches

I agree with last statement. count(*) is not most important.
Most nice thing with index only scan is when it contains more than one
column.
When there is join among many tables where from each table only one or few
columns are taken
it take boost query incredibly.

For exmaple on when you have customer table and ID, NAME index on it then:

select c.name,i.* from customer c, invoice i where c.id=i.customer_id

then it is HUGE difference there. without index only scan you require to
make index io and
random table access (assuming no full scan). With index only scan you need
only
index scan and can skip expensive random table io.
It is very simple but powerful optmization in many cases to reduce join
expence on many
difficult queries.
You can have get some kind of index organized table (you use only index so
in fact it is
ordered table)

Selecting only few columns is quite often scenario in reporting.

-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Jonah H. Harris
Sent: Wednesday, January 12, 2005 8:36 PM
To: Greg Stark
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Much Ado About COUNT(*)

Greg Stark wrote:

>I think part of the problem is that there's a bunch of features related to
>these types of queries and the lines between them blur.
>
>You seem to be talking about putting visibility information inside indexes
for
>so index-only plans can be performed. But you're also talking about queries
>like "select count(*) from foo" with no where clauses. Such a query
wouldn't
>be helped by index-only scans.
>
>Perhaps you're thinking about caching the total number of records in a
global
>piece of state like a materialized view? That would be a nice feature but I
>think it should done as a general materialized view implementation, not a
>special case solution for just this one query.
>
>Perhaps you're thinking of the min/max problem of being able to use indexes
to
>pick out just the tuples satisfying the min/max constraint. That seems to
me
>to be one of the more tractable problems in this area but it would still
>require lots of work.
>
>I suggest you post a specific query you find is slow. Then discuss how you
>think it ought to be executed and why.
>
>
>
You are correct, I am proposing to add visibility to the indexes.

As for unqualified counts, I believe that they could take advantage of
an index-only scan as it requires much less I/O to perform an index scan
than a sequential scan on large tables.

Min/Max would also take advantage of index only scans but say, for
example, that someone has the following:

Relation SOME_USERS
user_id BIGINT PK
user_nm varchar(32) UNIQUE INDEX
some_other_attributes...

If an application needs the user names, it would run SELECT user_nm FROM
SOME_USERS... in the current implementation this would require a
sequential scan. On a relation which contains 1M+ tuples, this requires
either a lot of I/O or a lot of cache. An index scan would immensely
speed up this query.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

In response to

Browse pgsql-announce by date

  From Date Subject
Next Message Alvaro Herrera 2005-01-12 21:43:46 Re: Much Ado About COUNT(*)
Previous Message Tom Lane 2005-01-12 21:38:44 Re: [HACKERS] segfault caused by heimdal (was: SUSE port)

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2005-01-12 21:43:46 Re: Much Ado About COUNT(*)
Previous Message Tom Lane 2005-01-12 21:38:44 Re: [HACKERS] segfault caused by heimdal (was: SUSE port)

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2005-01-12 21:43:46 Re: Much Ado About COUNT(*)
Previous Message Tom Lane 2005-01-12 21:38:44 Re: [HACKERS] segfault caused by heimdal (was: SUSE port)