Re: COUNT(*) and index-only scans

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: bruce(at)momjian(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: COUNT(*) and index-only scans
Date: 2011-10-15 22:58:49
Message-ID: CAMkU=1xeG9M4cYgoMTGrY1GCF==JKn4mm+D-oTT4u-n=MDKdcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 10, 2011 at 9:48 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Jeff Janes  wrote:
>> Kevin Grittner  wrote:
>
>>> create table t (id int not null primary key);
>>> insert into t select generate_series(1, 1000000);
>>> vacuum freeze analyze;
>>> explain analyze select count(*) from t
>>> where id between 500000 and 500010;
>>>
>>> That gives you an index-only scan; but without the WHERE clause it
>>> uses a seq scan.
>>
>> If you convert the where clause to "where id is not null" it uses
>> the index only scan again, but only if you nudge it too with
>> enable_seqscan=off.

With a recent commit from (I assume) Tom, the "where id is not null"
is no longer needed.

> Clever way to get a full-table test.
>
> It turns out that for the above, with your trick to use the index
> only scan, it comes out 12% faster to do a seqscan, even when the
> table and index are fully cached (based on the average time of ten
> runs each way).  There's very little overlap, so the difference looks
> real.  But that's on a very narrow record, having just the one column
> used in the index.  I added one wide column like this:
>
> alter table t add column x text;
> update t set x = (repeat(random()::text, (random() * 100)::int));
> cluster t USING t_pkey;
> vacuum freeze analyze;
>
> With that change the index-only scan time remained unchanged, while
> the seqscan time grew to about 2.6 times the index only scan time.
> That was mildly surprising for me, considering it was all still
> cached.

I used the pgbench_accounts table from pgbench -i -s 50, where all
data fits in shared_buffers, using the -f switch with either

set enable_seqscan=off;
select count(*) from pgbench_accounts;

or

set enable_indexonlyscan=off;
select count(*) from pgbench_accounts;

With just a single client, it was a toss-up. But with 8 concurrent
clients on a 8 CPU machine, the index-only scan was 50% faster. So
that is a nice win, even if well-designed apps probably shouldn't be
endlessly counting rows of an unchanging table using all available
CPUs in the first place.

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2011-10-15 23:28:56 plpython SPI cursors
Previous Message Noah Misch 2011-10-15 22:48:58 Re: Pushing ScalarArrayOpExpr support into the btree index AM