Re: COUNT(*) and index-only scans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COUNT(*) and index-only scans
Date: 2011-10-10 17:48:19
Message-ID: CA+TgmoYMTacw-dQWkjRZn9w8w5wqCB0W4PLeE-SJ4Pf+tq2yhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 10, 2011 at 1:36 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> I talked to Robert Haas and he said that index-only scans do not
>> optimize COUNT(*).  Is this something we can do for PG 9.2?  Is
>> anyone working on this?
>
> Well, it's not that it doesn't optimize COUNT(*) -- it's that it
> doesn't yet cost the index scan as cheaper than a table scan when
> you're accessing every row.
>
> create table t (id int not null primary key);
> insert into t select generate_series(1, 1000000);
> vacuum freeze analyze;
> explain analyze select count(*) from t
>  where id between 500000 and 500010;
>
> That gives you an index-only scan; but without the WHERE clause it
> uses a seq scan.  I think it's mainly a matter of doing enough
> benchmarks to figure out how best to model the costs of the index
> scan so that it can be picked for that case.

Right now, our costing model for index-only scans is pretty dumb. It
assumes that using an index-only scan will avoid 10% of the heap
fetches. That could easily be low, and on an insert-only table or one
where only the recently-updated rows are routinely accessed, it could
also be high. To use an index-only scan for a full-table COUNT(*),
we're going to have to be significantly smarter, because odds are good
that skipping 10% of the heap fetches won't be sufficient inducement
to the planner to go that route; we are going to need a real number.

This isn't just an exercise in costing, though: right now, we don't
even generate a plan to use an index for a full-table scan, because we
assume that it can never be cheaper. This is actually not quite true
even in previous releases (suppose the table is severely bloated but
the index is not) and it's going to be less true now that we have
index-only scans. So that's going to need some adjustment, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2011-10-10 17:52:39 Re: unite recovery.conf and postgresql.conf
Previous Message Jeff Davis 2011-10-10 17:45:01 Re: Range Types - typo + NULL string constructor