From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: GIN improvements part2: fast scan |
Date: | 2013-06-19 08:56:37 |
Message-ID: | CAPpHfdvOcjQZwBeT8on1yztt9e4VN_7orDXVbtE26rctS-nP_g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 19, 2013 at 12:49 PM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:
> On 19.06.2013 11:30, Alexander Korotkov wrote:
>
>> On Wed, Jun 19, 2013 at 11:48 AM, Heikki Linnakangas<
>> hlinnakangas(at)vmware(dot)com> wrote:
>>
>> On 18.06.2013 23:59, Alexander Korotkov wrote:
>>>
>>> I would like to illustrate that on example. Imagine you have fulltext
>>>> query
>>>> "rare_term& frequent_term". Frequent term has large posting tree while
>>>>
>>>> rare term has only small posting list containing iptr1, iptr2 and iptr3.
>>>> At
>>>> first we get iptr1 from posting list of rare term, then we would like to
>>>> check whether we have to scan part of frequent term posting tree where
>>>> iptr
>>>> < iptr1. So we call pre_consistent([false, true]), because we know
>>>> that
>>>> rare term is not present for iptr< iptr2. pre_consistent returns
>>>> false.
>>>> So
>>>> we can start scanning frequent term posting tree from iptr1. Similarly
>>>> we
>>>> can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
>>>> maximum possible pointer.
>>>>
>>>>
>>> Thanks, now I understand the rare-term& frequent-term problem. Couldn't
>>>
>>> you do that with the existing consistent function? I don't see why you
>>> need
>>> the new pre-consistent function for this.
>>>
>>
>> In the case of two entries I can. But in the case of n entries things
>> becomes more complicated. Imagine you have "term_1& term_2& ...&
>> term_n"
>>
>> query. When you get some item pointer from term_1 you can skip all the
>> lesser item pointers from term_2, term_3 ... term_n. But if all you have
>> for it is consistent function you have to call it with following check
>> arguments:
>> 1) [false, false, false, ... , false]
>> 2) [false, true, false, ... , false]
>> 3) [false, false, true, ... , false]
>> 4) [false, true, true, ..., false]
>> ......
>> i.e. you have to call it 2^(n-1) times. But if you know the query specific
>> (i.e. in opclass) it's typically easy to calculate exactly what we need in
>> single pass. That's why I introduced pre_consistent.
>>
>
> Hmm. So how does that work with the pre-consistent function? Don't you
> need to call that 2^(n-1)-1 times as well?
I call pre-consistent once with [false, true, true, ..., true].
Pre-consistent knows that each true passed to it could be false positive.
So, if it returns false it guarantees that consistent will be false for all
possible combinations.
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2013-06-19 09:03:40 | Re: Add visibility map information to pg_freespace. |
Previous Message | Heikki Linnakangas | 2013-06-19 08:49:44 | Re: GIN improvements part2: fast scan |