From: | Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | GSoC 2015 proposal. Bitmap Index-only Count |
Date: | 2015-03-24 11:56:32 |
Message-ID: | CAP4vRV7xdUE7Eq7ZFgoWSC6aXmH9AM3qeSQ139dzZhpCcQGr3A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, hackers!
Here is the text of my proposal which I've applied to GSoC.
(and link
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/lubennikovaav/5657382461898752
)
Any suggestions and comments are welcome.
*Project name*
Bitmap Index-only Count
*Brief review*
There is a problem of slow counting in PostgreSQL [1]. The reason why this
is slow is related to the *MVCC* implementation in PostgreSQL. Index-only
scans (implemented since PostgreSQL-9.2) providing some performance
improvements where the *visibility map* of the table allows it. That’s
good. But it works only for access methods which provide amgettuple method.
Unfortunately GIN supports only BitmapIndexScan and has no implementation
of index_getnext() interface [2]. Idea of the proposal is to implement
Bitmap Index-Only Count method that allows to count elements, without
reference to the heap.
*Benefits to the PostgreSQL Community*
Faster count(*) would be actual for GIN. Probably it would accelerate
count(*) for other access methods too, but I do not sure if it would be
significant difference.
*Quantifiable results*
Acceleration of count(*) queries with WHERE clause which use GIN.
*Project details*
Let’s look at count(*) query:
EXPLAIN SELECT count(*) from tbl_mytbl where val @> '{"b":2}';
Now the query plan looks like this:
Aggregate
-> Bitmap Heap Scan on tbl_mytbl
Recheck Cond: (val @> '{"b": 2}'::jsonb)
-> Bitmap Index Scan on idx_mytbl
Index Cond: (val @> '{"b": 2}'::jsonb)
Idea of the proposal is to implement Bitmap Index-Only Count method that
allows to count elements, without reference to the heap.
Conditions:
- all tuples are visible (the same problem as for Index-only count(*));
- result TIDBitmap is not lossy. nchunks = 0;
int nchunks
<http://doxygen.postgresql.org/structTIDBitmap.html#a85d5883056bad6863cb47a15446581c7>;
/* number of lossy entries in pagetable */
- pages in TIDBitmap don’t require recheck
* recheck is used only on exact pages --- it indicates that although
* only the stated tuples need be checked, the full index qual condition
* must be checked for each (ie, these are candidate matches).
If all conditions are true, it’s possible to call aggregate COUNT function
for each tuple from TIDBitmap returned by Bitmap Index Scan (or
BitmapAnd/BitmapOr nodes). And there’s no necessity to call Bitmap Heap
Scan.
As a GSoC student I will create new Node “Bitmap Index-Only Scan”, which
would catch tuples from Bitmap Index Scan node and pass them to Aggregate
node. Thus, new query plan will be as follow:
Aggregate
-> Bitmap Index-only Count
-> Bitmap Index Scan on idx_mytbl
Index Cond: (val @> '{"b": 2}'::jsonb)
*Project Schedule*
until May 25
Read documentation and source code, clarify details of implementation.
until June 30
Implement Bitmap Index-only Count node.
1 July – 31 July
Add Bitmap Index-only Count node to Planner.
1 August -15 August
Final refactoring, testing and submitting a patch.
*Links*
1. https://wiki.postgresql.org/wiki/Slow_Counting
2.
http://postgresql.nabble.com/Todo-item-Support-amgettuple-in-GIN-td5780810.html
--
Best regards,
Lubennikova Anastasia
From | Date | Subject | |
---|---|---|---|
Next Message | Thom Brown | 2015-03-24 12:03:32 | Re: Error with index on unlogged table |
Previous Message | Thom Brown | 2015-03-24 11:46:39 | Re: Error with index on unlogged table |