Skip site navigation (1) Skip section navigation (2)

Re: COUNT(*) and index-only scans

From: Thom Brown <thom(at)linux(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COUNT(*) and index-only scans
Date: 2011-11-19 16:22:58
Message-ID: CAA-aLv75+ZAPKP1uyrLsK2N028u9aCjOrwTSSVbESqnmz5oz+A@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On 19 November 2011 16:08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thom Brown <thom(at)linux(dot)com> writes:
>> So is there a chance of getting bitmap index-only scans?
>
> Don't hold your breath.  It seems like a huge increment of complexity
> for probably very marginal gains.  The point of a bitmap scan (as
> opposed to regular indexscan) is to reduce heap accesses by combining
> visits to the same page; but it's not clear how useful that is if
> you're not making heap accesses at all.

Well consider:

pgbench=# explain analyse select count(*) from pgbench_accounts where
aid between 143243 and 374825 or aid between 523242 and 712111;

 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=83016.38..83016.39 rows=1 width=0) (actual
time=1039.282..1039.282 rows=1 loops=1)
   ->  Bitmap Heap Scan on pgbench_accounts  (cost=7934.54..81984.58
rows=412719 width=0) (actual time=243.012..977.946 rows=420453
loops=1)
         Recheck Cond: (((aid >= 143243) AND (aid <= 374825)) OR ((aid
>= 523242) AND (aid <= 712111)))
         ->  BitmapOr  (cost=7934.54..7934.54 rows=423802 width=0)
(actual time=228.934..228.934 rows=0 loops=1)
               ->  Bitmap Index Scan on pgbench_accounts_pkey
(cost=0.00..4299.40 rows=235782 width=0) (actual time=134.410..134.410
rows=231583 loops=1)
                     Index Cond: ((aid >= 143243) AND (aid <= 374825))
               ->  Bitmap Index Scan on pgbench_accounts_pkey
(cost=0.00..3428.78 rows=188020 width=0) (actual time=94.520..94.520
rows=188870 loops=1)
                     Index Cond: ((aid >= 523242) AND (aid <= 712111))
 Total runtime: 1039.598 ms
(9 rows)

Since I can't get this to use an index-only scan, it will always visit
the heap.  Instead I'd be forced to write:

pgbench=# explain analyse select count(*) from (select aid from
pgbench_accounts where aid between 143243 and 374825 union all select
aid from pgbench_accounts where aid between 523242 and 712111) x;

           QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17263.72..17263.73 rows=1 width=0) (actual
time=232.053..232.053 rows=1 loops=1)
   ->  Append  (cost=0.00..16204.22 rows=423802 width=0) (actual
time=10.925..195.134 rows=420453 loops=1)
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..9015.04
rows=235782 width=0) (actual time=10.925..90.116 rows=231583 loops=1)
               ->  Index Only Scan using pgbench_accounts_pkey on
pgbench_accounts  (cost=0.00..6657.22 rows=235782 width=4) (actual
time=10.924..61.822 rows=231583 loops=1)
                     Index Cond: ((aid >= 143243) AND (aid <= 374825))
         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..7189.18
rows=188020 width=0) (actual time=0.062..62.953 rows=188870 loops=1)
               ->  Index Only Scan using pgbench_accounts_pkey on
pgbench_accounts  (cost=0.00..5308.98 rows=188020 width=4) (actual
time=0.061..40.343 rows=188870 loops=1)
                     Index Cond: ((aid >= 523242) AND (aid <= 712111))
 Total runtime: 232.291 ms
(9 rows)

These 2 queries are equal only because the ranges being checked don't
overlap, so if arbitrary values were being substituted, and the ranges
did happen to overlap, that last method couldn't work.  I'd have to
use a UNION ALL in that particular case, which adds a lot of overhead
due to de-duplication.

While I accept that maybe adapting the existing bitmap index scan
functionality isn't necessarily desirable, would it be feasible to
create a corresponding bitmap index-only scan method.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

pgsql-hackers by date

Next:From: Greg SmithDate: 2011-11-19 16:44:46
Subject: Re: Core Extensions relocation
Previous:From: Tom LaneDate: 2011-11-19 16:08:02
Subject: Re: COUNT(*) and index-only scans

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group