GSoC 2015 proposal. Bitmap Index-only Count

From: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: GSoC 2015 proposal. Bitmap Index-only Count
Date: 2015-03-24 11:56:32
Message-ID: CAP4vRV7xdUE7Eq7ZFgoWSC6aXmH9AM3qeSQ139dzZhpCcQGr3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, hackers!
Here is the text of my proposal which I've applied to GSoC.
(and link
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/lubennikovaav/5657382461898752
)
Any suggestions and comments are welcome.

*Project name*

Bitmap Index-only Count

*Brief review*

There is a problem of slow counting in PostgreSQL [1]. The reason why this
is slow is related to the *MVCC* implementation in PostgreSQL. Index-only
scans (implemented since PostgreSQL-9.2) providing some performance
improvements where the *visibility map* of the table allows it. That’s
good. But it works only for access methods which provide amgettuple method.
Unfortunately GIN supports only BitmapIndexScan and has no implementation
of index_getnext() interface [2]. Idea of the proposal is to implement
Bitmap Index-Only Count method that allows to count elements, without
reference to the heap.

*Benefits to the PostgreSQL Community*

Faster count(*) would be actual for GIN. Probably it would accelerate
count(*) for other access methods too, but I do not sure if it would be
significant difference.

*Quantifiable results*

Acceleration of count(*) queries with WHERE clause which use GIN.

*Project details*

Let’s look at count(*) query:

EXPLAIN SELECT count(*) from tbl_mytbl where val @> '{"b":2}';

Now the query plan looks like this:

Aggregate

-> Bitmap Heap Scan on tbl_mytbl

Recheck Cond: (val @> '{"b": 2}'::jsonb)

-> Bitmap Index Scan on idx_mytbl

Index Cond: (val @> '{"b": 2}'::jsonb)

Idea of the proposal is to implement Bitmap Index-Only Count method that
allows to count elements, without reference to the heap.

Conditions:

- all tuples are visible (the same problem as for Index-only count(*));
- result TIDBitmap is not lossy. nchunks = 0;

int nchunks
<http://doxygen.postgresql.org/structTIDBitmap.html#a85d5883056bad6863cb47a15446581c7>;
/* number of lossy entries in pagetable */

- pages in TIDBitmap don’t require recheck

* recheck is used only on exact pages --- it indicates that although

* only the stated tuples need be checked, the full index qual condition

* must be checked for each (ie, these are candidate matches).

If all conditions are true, it’s possible to call aggregate COUNT function
for each tuple from TIDBitmap returned by Bitmap Index Scan (or
BitmapAnd/BitmapOr nodes). And there’s no necessity to call Bitmap Heap
Scan.

As a GSoC student I will create new Node “Bitmap Index-Only Scan”, which
would catch tuples from Bitmap Index Scan node and pass them to Aggregate
node. Thus, new query plan will be as follow:

Aggregate

-> Bitmap Index-only Count

-> Bitmap Index Scan on idx_mytbl

Index Cond: (val @> '{"b": 2}'::jsonb)

*Project Schedule*

until May 25

Read documentation and source code, clarify details of implementation.

until June 30

Implement Bitmap Index-only Count node.

1 July – 31 July

Add Bitmap Index-only Count node to Planner.

1 August -15 August

Final refactoring, testing and submitting a patch.

*Links*

1. https://wiki.postgresql.org/wiki/Slow_Counting
2.
http://postgresql.nabble.com/Todo-item-Support-amgettuple-in-GIN-td5780810.html

--
Best regards,
Lubennikova Anastasia

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2015-03-24 12:03:32 Re: Error with index on unlogged table
Previous Message Thom Brown 2015-03-24 11:46:39 Re: Error with index on unlogged table