Re: index-only count(*) for indexes supporting bitmap scans

From: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: index-only count(*) for indexes supporting bitmap scans
Date: 2017-04-12 17:23:22
Message-ID: 251401bb-6f53-b957-4128-578ac22e8acf@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 12.04.2017 12:29, Alexander Korotkov wrote:

> That's a cool feature for FTS users! Please, register this patch to
> the next commitfest.
I've added this to the 2017-07 commitfest:
https://commitfest.postgresql.org/14/1117/

> Also, what is planning overhead of this patch? That's worth
> testing too.
>
The planning overhead is about 0.1 - 0.07 ms on the samples from my
first letter.

> Another thing catch my eye. The estimated number of rows in Bitmap
> Count node is the same as in Bitmap Index Scan node. Should it be 1
> because it always returns single row?
You're right, I'll fix this in the next version of the patch.

> Does this limitation cause a performance drawback? When bitmap
> index scan returns all rechecks, alternative to Bitmap Count is
> still Aggregate + Bitmap Heap Scan. Thus, I think Bitmap Count
> would have the same performance or even slightly faster. That's
> worth testing.
>
Bitmap heap scan can indeed be faster, because it prefetches heap pages,
and can be run in parallel. When the table data is not cached, the
difference is not big on my machine. It could probably be significant if
I used a storage that supported parallel reads. When the data is cached
in memory, the parallel bitmap heap scan can be significantly faster.
We could use the standard bitmap heap scan node with some tweaks, as
discussed in the other subthread, to avoid this regression.

Here are some test queries:

--- not cached
-------------------------------------------------------------------------------------------------------------------
test1=# explain analyze select count(*) from pglist where fts @@
to_tsquery( 'tom & lane' );
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Count on pglist (cost=542.65..1087.68 rows=54503 width=8)
(actual time=30264.174..30264.177 rows=1 loops=1)
Recheck Cond: (fts @@ to_tsquery('tom & lane'::text))
Rows Removed by Index Recheck: 270853
Heap Fetches: 66138
Heap Blocks: exact=39854 lossy=66138
-> Bitmap Index Scan on idx_pglist_fts (cost=0.00..529.02
rows=54503 width=0) (actual time=525.341..525.341 rows=222813 loops=1)
Index Cond: (fts @@ to_tsquery('tom & lane'::text))
Planning time: 128.238 ms
Execution time: 30264.299 ms
(9 rows)

test1=# set enable_bitmapcount to off;
SET
test1=# explain analyze select count(*) from pglist where fts @@
to_tsquery( 'tom & lane' );
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=119989.73..119989.74 rows=1 width=8) (actual
time=31699.829..31699.830 rows=1 loops=1)
-> Gather (cost=119989.52..119989.73 rows=2 width=8) (actual
time=31698.699..31699.819 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=118989.52..118989.53 rows=1
width=8) (actual time=31689.289..31689.290 rows=1 loops=3)
-> Parallel Bitmap Heap Scan on pglist
(cost=542.65..118932.74 rows=22710 width=0) (actual
time=608.532..31634.692 rows=74271 loops=3)
Recheck Cond: (fts @@ to_tsquery('tom & lane'::text))
Rows Removed by Index Recheck: 90284
Heap Blocks: exact=13242 lossy=21960
-> Bitmap Index Scan on idx_pglist_fts
(cost=0.00..529.02 rows=54503 width=0) (actual time=552.136..552.136
rows=222813 loops=1)
Index Cond: (fts @@ to_tsquery('tom &
lane'::text))
Planning time: 160.055 ms
Execution time: 31724.468 ms
(13 rows)

----- cached
-------------------------------------------------------------------------------------------------------------------------------------
test1=# explain analyze select count(*) from pglist where fts @@
to_tsquery( 'tom & lane' );
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=119989.73..119989.74 rows=1 width=8) (actual
time=1250.973..1250.973 rows=1 loops=1)
-> Gather (cost=119989.52..119989.73 rows=2 width=8) (actual
time=1250.588..1250.964 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=118989.52..118989.53 rows=1
width=8) (actual time=1246.144..1246.144 rows=1 loops=3)
-> Parallel Bitmap Heap Scan on pglist
(cost=542.65..118932.74 rows=22710 width=0) (actual
time=82.781..1237.585 rows=74271 loops=3)
Recheck Cond: (fts @@ to_tsquery('tom & lane'::text))
Rows Removed by Index Recheck: 90284
Heap Blocks: exact=13221 lossy=22217
-> Bitmap Index Scan on idx_pglist_fts
(cost=0.00..529.02 rows=54503 width=0) (actual time=78.366..78.366
rows=222813 loops=1)
Index Cond: (fts @@ to_tsquery('tom &
lane'::text))
Planning time: 0.722 ms
Execution time: 1256.028 ms
(13 rows)

test1=# set enable_bitmapcount to on;
SET
test1=# explain analyze select count(*) from pglist where fts @@
to_tsquery( 'tom & lane' );
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Bitmap Count on pglist (cost=542.65..1087.68 rows=54503 width=8)
(actual time=2745.740..2745.742 rows=1 loops=1)
Recheck Cond: (fts @@ to_tsquery('tom & lane'::text))
Rows Removed by Index Recheck: 270853
Heap Fetches: 66138
Heap Blocks: exact=39854 lossy=66138
-> Bitmap Index Scan on idx_pglist_fts (cost=0.00..529.02
rows=54503 width=0) (actual time=85.572..85.572 rows=222813 loops=1)
Index Cond: (fts @@ to_tsquery('tom & lane'::text))
Planning time: 0.701 ms
Execution time: 2745.800 ms
(9 rows)

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-04-12 17:34:37 Re: Cutting initdb's runtime (Perl question embedded)
Previous Message Robert Haas 2017-04-12 17:23:06 Re: GSOC'17 project introduction: Parallel COPY execution with errors handling