Re: GSoC 2015 proposal. Bitmap Index-only Count

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GSoC 2015 proposal. Bitmap Index-only Count
Date: 2015-03-27 19:49:53
Message-ID: CAPpHfds=nurYJewEcSvv=Baru3QPct3Hrc7zqSa03VE_UZbgNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 25, 2015 at 11:07 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com> writes:
> > 2015-03-24 18:01 GMT+04:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> >> I wonder whether it'd be possible to teach GIN to support index_getnext
> >> instead. Initially it would probably work only for cases where the
> >> index didn't have to return any columns ... but if we did it, maybe the
> >> door would be open to cases where GIN could reconstruct actual values.
>
> > Another idea is to write index_getnext() for GIN which would return some
> > fake tuple. Since there is no difference for COUNT aggregate what the
> tuple
> > contains. COUNT just wants to know whether we have tuple that satisfy the
> > qual.
>
> Well, yeah, that would be the idea (at least initially). You don't have
> to return any real data unless you claim you can do so via amcanreturn.
> The planner is still capable of selecting an index-only scan as long as
> the query retrieves no columns.
>
> The trick would be to not return the same heap TID more than once per
> scan. A zero-order implementation would be to construct the same bitmap
> we do now and then just provide a gingetnext function that scans through
> that. That would be pretty awful in terms of scan startup time, so doing
> better would be nice; but perhaps it would be useful even in that form.

My ideal picture for FTS using GIN looks like this:

1) Have lexemes offsets in GIN stored with item pointers.
2) Calculate relevance using only GIN information without using heap.
3) Sort results by relevance in either GIN itself or executor node.
4) Get both TOP-N most relevant rows and total rows count (using index-only
scan) from single GIN index scan.

Implementing index_getnext() for GIN looks step forward for me because it
allows "index only count" and potentially could be used for ordered output.
However, it's unclear for me if it's feasible to have #4? Could we return
TOP-N results and total count from single GIN index scan?

------
With best regards,
Alexander Korotkov.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dima Ivanovskiy 2015-03-27 21:41:42 Re[2]: [HACKERS] GSoC 2015: SP-GIST for geometrical objects
Previous Message Fabrízio de Royes Mello 2015-03-27 19:23:24 Doubt about AccessExclusiveLock in ALTER TABLE .. SET ( .. );