Re: index-only count(*) for indexes supporting bitmap scans

From: Alexander Kumenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: index-only count(*) for indexes supporting bitmap scans
Date: 2017-08-21 14:25:37
Message-ID: 122a32e4-73bd-ec85-9dad-a72729635c65@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

|Hello everyone,

I made a new patch according to the previous comments. It is simpler
now, only adding a few checks to the bitmap heap scan node. When the
target list for the bitmap heap scan is empty, and there is no filter,
and the bitmap page generated by the index scan is exact, and the
corresponding heap page is visible to all transaction, we don't fetch it.

The performance is better than with the previous patch, because now it
can leverage the parallel heap scan logic. A simple benchmark is
attached: this patch is more than ten times faster on a frequent search
term, and two times faster on an infrequent one.

Still, there is one thing that is bothering me. I use empty targetlist
as the marker of that I should not fetch tuples. Because of that, I have
to make sure that use_physical_tlist() doesn't touch empty tlists.
Consequently, if count(*) sits on top of a subquery, this subquery has
to project and cannot be deleted (see trivial_subqueryscan). There is
such a query in the regression test select_distinct: "select count(*)
from (select distinct two, four, two from tenk1);". For that particular
query it shouldn't matter much, so I changed the test, but the broader
implications of this escape me at the moment.

The cost estimation is very simplified now: I just halve the number of
pages to be fetched. The most important missing part is checking whether
we have any quals that are not checked by the index: if we do, we'll
have to fetch all the tuples. Finding nonindex qualifications is
somewhat convoluted for the bitmap index scan tree involving multiple
indexes, so I didn't implement it for now. We could also consider
estimating the number of lossy pages in the tid bitmap given current
work_mem size.

I'll be glad to hear your thoughts on this.|

Attachment Content-Type Size
index-only-count-v2.patch text/x-patch 13.7 KB
benchmark2.txt text/plain 4.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-21 14:29:02 Re: shm_mq_wait_internal gets stuck forever on fast shutdown
Previous Message Craig Ringer 2017-08-21 14:07:29 Re: shm_mq_wait_internal gets stuck forever on fast shutdown